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ABSTRACT 


Substantial genetic diversity exists among the people of Pakistan and understanding 
evolution of this diversity is complicated due to several waves of migration from 
populations in the North and Northwest. Pathans, one of the largest ethnic groups of 
Pakistan residing mostly in the northwestern part which remained unexplored with 
respect to their biological affinity and haplotypic diversity, is the subject of this 
exploration.This dissertation reports the results of the PhD project wherein, we have 
assessed the extent of genetic diversity using Y-STR profiling and mtDNA sequence 
analyses of HVI control region of Pathan populations of Mardan and Charsada districts 
of KP province, Pakistan.Sum of 374 buccal swabs were collected from five major 
population of the two adjoining districts viz Charsada and Mardan. Y-STR profiling was 
carriedout using Promega PowerPlex® Y23 System. Mitochondrial HVI data was also 
generated for 165samples. Y-chromosomal and mtDNA haplogroups were assigned to 
each sample using a haplogroup predictor and phylotree, respectively. Principle 
Coordinate Analysis (PCoA) plots were generated by combining our data set with other 
published datasets from neighboring populations of central Asia, Middle East, Europe 
and South Asia. Our results revealed that the most frequent Y-STR haplogroup found 
was Rla (49.2%), followed by G2a (17.9%) and L (9.6%), while mtDNA HVI macro 
haplogroups R (63.4%), M (26.8%) and N (8.6%) were observed more frequent among 
Pathans of the area. Some novel mitochondrial haplogroups (mtHgs) of M and R 


subclades have also been recorded for the first time for Pathans. Analysis of our results 
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revealed that no significant genetic substructure was available among all the five 
populations with respect to 23 Y-STRs. Furthermore the Pathans of Charsada and 
Mardan are clustered together with little genetic difference. These people show 
homology with already studied Pathans from Pakistan, the Yousafzai tribe from northern 
proximity of Pakistan, as well as with the Pathans from Afghanistan and with some 
populations from Russian Federation, when compared on the basis of Y-STR profiles. The 
clustering pattern was different when compared on the basis of mitochondrial HVI data 
sets of different neighboring populations, which depicts more male mediated gene flow 
among populations than female mediated gene flow. In addition, the mitochondrial gene 
pool of these populations seems to be influenced by the Turkish invasion. The gene pool 
of the Pathan populations of Mardan and Charsada districts exhibit genetic affinities with 
other Pathans from Pakistan and Afghanistan, and exhibit shared haplogroups with 
Middle Eastern, Caucasian, South Asian and Central Asian people. The study provides a 
sound reference for collating molecular anthropology and genetic structure of the people 


of Pakistan. 
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Chapter 1 
INTRODUCTION 


A comprehensive rational interest, about the origin of mankind has always been 
compelled the human brain to seek logical answers. In fact the key inspiration behind 
this grilling and the search for human genetic variation has always been a challenge to 
cope up with and to explore the population structure in general and genetics of diseases 
in particular. Genetic variations available in the modern people also reflect the past of 
mankind, its origin and the spread of anatomically modern humans and their 
demographic history. It is increasingly more evident that knowledge about these 
processes is equally indispensable for a proper understanding of the genetics of complex 


diseases. 


Decades long research of genetic variation has revealed that a significant degree of 
diversity and variation lies both within and between existing human populations around 
the globe. Despite the enormous amount of data gathered during this relatively long time 
span of “classical era”, has highlighted many basic problems but left them largely 
unresolved. Increasingly more informative “DNA era”, rapidly expanding during the 
last 35 years, took up the same list of problems and is formulating new ones. Largely 
irrespective of what kind of general questions related to demographic history are being 
asked, the present-day genetics investigates variation in three different systems i.e. bi- 
parental autosomal chromosomes, paternally inherited Y chromosome and maternally 


inherited mitochondrial DNA. In spite of having manifold large size of autosomal genes 
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than that of Y chromosome and mtDNA, the uniparentally inherited markers have 
powerful advantages, making them the favorite and handy tools of choice for the 
population geneticists. Besides their uniparental mode of inheritance, the advantage of 
their lacking of recombination is of prime importance that helps them to reconstruct 
genetic lineages back to the most recent common ancestors (MRCA). i.e. The Y 
chromosomal Adam and mitochondrial Eve. Furthermore in combination with 
information about the variation in autosomal genes, a promising synthesis is hopefully 
possible in the future. My thesis is about Y chromosomal STR haplotype and mtDNA 
HVI variation in major Pathan populations (tribes) of the two historically important 
geographically adjacent areas that remained under the influence of many foreign 
invaders. In order to understand the Y chromosomal STR variation and mtDNA HVI 
variation and its origin in these populations, additional background knowledge of those 
living outside this area is needed. Due to their scrupulous properties Y-STRs mtDNA 
allow to construct individual genealogies and connect them to a global phylogenetic tree 
of all humans. Basal branches of these Y Haplogroup and mtDNA trees have been shown 
to be highly continent specific, making these uniparentally inherited loci more valuable 
makers compared to other tools where their overall variation is hidden mainly within 
populations. The study of the spread and variability of Y-haplogroups and mtDNA 
clusters enable the scientists to address questions related to the early population 


movements of anatomically modern humans. 


24 


Through the collective effort of many labs and researchers around the globe, the 
Ychromosomal and mitochondrial DNA clusters (haplogroups) have been investigated 


and explored in different continental regions. 


South Asia comprising India, Pakistan, countries in the sub-Himalayan region and 
Myanmar was one of the first geographical regions to have been inhabited by modern 
humans. This region has served as a major route of dispersal to other geographical 
regions, including Southeast Asia. Populations of Pakistan show high genetic 
differentiation and extensive structuring as a result of evolutionary antiquity and 
commonly practiced endogamy. Linguistic variations of populations present the best 


account of genetic differences observed in this region of the world. 


The progression of local population expansion and consequent dispersal to newer areas, 
coupled with evolution of cultural practices and geographical distance acting as a barrier 
to human contact, also resulted in the formation of intra-marrying groups; groups within 
which there was considerable gene exchange among individuals, but between which 
there was little genetic exchange. The development and evolution of language and 
culture embellished the formation of such groups. 

Genetic diversity within a geographical area is a logical indicator of the age of the 
population” s habitation in that specific area. However, it is also dependent on the 
population’ s effective sizes, thus entailing that the assessment of antiquity may not be 
simple. It is now well-established that Africa exhibits the highest genetic diversity among 


continental populations. The genetic diversity in Indian subcontinent, “the heartland of 
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south Asia”, is second to that of Africa. Mitochondrial DNA (mtDNA) and the non- 
recombining region of the Y-chromosome (NRY) based Molecular genetic studies have 
supported a single southern dispersal route to Indian sub continent. The populations of 
Indian sub continent including Pakistan and India are all derivatives of mitochondrial M 
and N lineages, which themselves are the derivation of L3 lineage that now found only 
in Africa. The M and N lineages probably diverged from L3 shortly after their dispersal 
from Africa (Quintana et al., 1999). Similarly Y-chromosomal data is also indicative of 
southern dispersal rout and this is supported by the presence of C and D Y-haplogroups 
only in the Asian continent and Oceania (Kivisild et al., 2003; Underhill et al., 2001) and 
not in western Eurasia and North Africa. 

Afghanistan being a landlocked nation in southwest Central Asia has central highlands 
of Hindu Kush Mountains extending from the northeast of the country to the southwest, 
separating Afghanistan into northern and southern provinces. Being at the intersection 
of Central Asia, South Asia, and the Middle East, Afghanistan has served as crossroads 
for human migrations and pilgrimages, including an important stop along the Silk Road. 
Mesolithic artifacts, Neolithic pottery of about 7.2kya, bones of domesticated animal and 
tools i.e. the sickle blades used to collect wild grasses, have been discovered during 
excavations in the Ghar-i-Mar site in north Afghanistan. These discoveries supported the 
early cultivation of wheat and barley around 9-11kya and domestication of animals 
around 7-9kya (Dupree et al., 1972). Other recent archeological findings include Buddhist 
artifacts transported northward from India along the Silk Road, as well as captions and 


inscriptions engraved on rocks in ancient Hebrew dating from the eleventh to the 
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thirteenth centuries (Katzir, 2001). Historians speculated the onset of urbanization in 
Afghanistan around 4 to 4.5kya (Runion, 2007). According to Historical records Aryans 
from Iran were the first to occupy Afghanistan in eighth century B.C.E.), followed by 
Persians in sixth century B.C.E and Greeks in fourth century B.C.E. Later Mauryans from 
India and GrecoBactrian escorted Buddhism into this region while s Arabs and Mongols 
launched Islam (Runion, 2007). Dari and Pashto are two languages of Indo-European 
origin spoken by two major Ethnic groups of Afghanistan; the Tajiks and Pathans or 
Pathans, respectively. Pathans comprising the most prevalent ethnic group in 
Afghanistan (42%), inhabit primarily the south of the Hindu Kush mountains. According 
to legendary record and oral traditions, these 

Pathans are considered to be the ancestors of Pathans in the Pakistan and Northern India. 
Different origins for Pathans, including Greek and Jewish ancestry, have been suggested 
(Runion, 2007; Caroe, 1976). These connections have not been confirmed and a few efforts 
characterizing the genetic structure of this group in Pakistan (Mohyuddin et al., 2001; 
Qamar et al., 2002; Mansoor et al., 2004; Sengupta et al., 2006; Firasat et al., 2007), 
Afghanistan (Di Cristofaro, 2012) and India (Noor et al., 2009) have been made. As 
previous studies were carried out without the tribal and subtribal status of the Pashtuns, 
so such study was of immense need to segregate them on the basis of their subtribal and 


caste level. 


The present study was undertaken to ascertain, for the first time, the genetic diversity of 


major Pathan tribes of Pakistan residing in Charsada and Mardan district of KP province 
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such as Yousafzai, Mohmand, Muhammadzai and Kakakhel Mian, utilizing 
Ychromosomal STR profile and mtDNA HVI sequence variation. The data produced 
were subsequently compared with formerly published geographically and ethnically 
connected indigenous and global populations to explore paternal and maternal signals 
of modern human’ s dispersals across this study area. In addition, we assessed genetic 
affinities between Pathans from our study area and Pathans from neighboring 
Afghanistan as being descendants of the same Y-chromosome, as well as their 


hypothesized phylogenetic relation to Greek and Jewish populations. 


Based on the legendry information, it was hypothesized that the populations under 
investigation are the descendants and immigrants of Pathan ethnic group of Southern 
Afghanistan. So they must have close genetic ties with their suspected ancestors. On the 
basis of above mentioned hypothesis, several following queries were raised to be 


justified after this research. 


1. What clusters (Haplogroups) are spread among populations of Charsada and 


Mardan? 
2. How do these clusters connect to the global tree? 


3. What is the diversity (%) of Y haplogroups and mtDNA haplogroups in these 
populations as compared to the populations who had invaded these areas and 


ruled for a long time? 


4. What has been an impact of succeeding gene flow between these populations? 
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5. What has been a contribution of invading populations to the maternal and paternal 


gene pool of the populations under study? 


The central focus of the literature overview of the current thesis will be therefore an 
overview of out of Africa migration and colonization of South Asian land mass by 
modern humans, the legendary ancestry of the populations under investigation and 
history of study area and finally the general skeleton tree connecting and characterizing 
the Y haplogroups and mtDNA haplogroups worldwide generally and in south Asia 


specifically. 
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Chapter2 
REVIEW OF LITERATURE 


Charles Darwin was the pioneer in suggesting the common African ancestry of human 
beings (Lafrenier, 2010). On the basis of some shared attributes he further proposed 
African apes to be their ancestors (Bowler et al., 2003). Later on his proposition was 
rejected and after several genome analyses of people from different racial backgrounds, 
it was concluded that all modern humans evolved from one 'lucky mother! 

in Africa about 150,000 years ago (Cann et al., 1987; Vigilant et al.,1991). The recurrent 
development in molecular genetics and availability of analytical tools has revolutionized 
molecular anthropology which needs to be strengthened in Pakistan 


also. 


2.1 Colonization of South Asia 


According to Darwin human population started dispersing around 125,000 years before 
present (BP) using one route from the Nile Valley directed towards the Middle East, 
around 120,000-100,000 years BP and the second one through Bab-el-Mandeb Strait on 
the Red Sea, crossing through the Arabian Peninsula and finally heading towards the 
present-day United Arab Emirates around125,000 years BP and Oman around 106,000 
years BP (Lafrenier, 2010) and then consequently reaching the Indian Subcontinent 
atJwalapuramaround75,000 years BP. Despite lacking any fossil record, the similarity 
among the stone tools collected from all these places, suggested that the modern humans 


to be their creator (Bower, 2011). These findings might give some support to the claim 


30 


that modern humans from Africa arrived at southern China about 100,000 years BP (Liu 
et al., 2010), and the Liujiang hominid controversially dated at 139,000- 111,000 years BP 
(Shena et al., 2002). Dating results of the Lunadong (Buding Basin, Guangxi, southern 
China) teeth, which include a right upper second molar and a left lower second molar, 


indicate that the molars may be as old as 126,000 years. 


For the previous exoduses from Africa genetic analyses based on the Y chromosome and 
mitochondrial genome doesn” t provide any clue, as they represent only a small part of 
the human genetic material.It could be assumed that those modern humans became 
extinct because of Toba catastrophe around 74,000 years BP. However, according to some 
arguments there was no remarkable impact of this catastrophe on human 


population (Balter, 2010): 


Someother authorities claim that Homo sapiens first appeared around 200,000 years ago in 
Ethiopia (White et al., 2003). After reaching Near East around 125,000 years ago, they 
moved back to Africa, as their settlements were replaced by Neanderthals. Now days it 
is strongly believed that the first modern human dispersal across Asia started about 
75,000 years ago across the Bab el Mandib linking Ethiopia and Yemen. From there, some 
of these people moved further to South Asia around 50,000 years ago, and then to 
Australia by 46,000 years ago(Bowleret al., 2003). H. sapiens reached Europe around 
43,000 years ago (Wilford, 2011), replacing the Neanderthals around 24,000 years ago. 


They reached East Asia some 30,000 years ago. 
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Migration of several languages and cultures has also been noticed with these human 
migrations. The initial exodus out of Africa 125,000 years ago was followed by several 
waves of migration via the Arabian Peninsula into Eurasia around 60,000 years ago, 
where one group of migrants rapidly passes through coastal areas of the Indian Ocean 


and other group migrating north towards Central Asia. 


Genetic evidence proved that modern humans have gone through genetic bottleneck 
resulting in reduced genetic diversity and then gone through a dramatic population 
growth from geographically dispersed populations around 50,000 years ago. And this 
idea of bottleneck was also supported by Henry Harpending” s findings as well as 
geological and climatological evidences from Lake Toba explosion (Harpending et al., 
2009), that had resulted in overall population reduction up to 15,000 individuals that 
consequently leading to rapid racial differentiation as a possible consequence of founder 


effect and increased genetic drift (Ambrose, 1998). 


According to Some genetic evidences the migration out of Africa had occurred along two 
different routes. However, other investigations revealed that a single migration occurred, 
followed by swift northern migration of a subset of the group. The group who followed 
southern route from West Asia spread generation by generation around the coast of 
Arabia and Persiafinally reached Indo-Pak subcontinent and headed towards 

Australia around 55,000 to 40,000 years ago (fig.1). The group directed towards north 


(East Asians were the second group) ventured inland (Maca et al., 2001) and exuded to 
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Europe, ultimately displacing the Neanderthals. They also radiated to India from Central 
Asia (Bowler et al., 2003). Some recent evidences suggest that the human migration from 
Africa to Southeast Asia and Australia have occurred between 80,000 and 120,000 years 
ago (Ewen, 2015) following multiple routesand the H. sapiens interbred with other species 


like Neanderthals in Europe, Denisovans in Central Asia and Homo erectus (Dannel, 


2012). 


Homo sapiens sapiens migrations 


30,000 ybp 
20,000 ybp 
40,000 ybp 





gy , 


Fig.1. Probable human out of Africa migration time scale. (Source: 
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http: / /www.roperld.com/ Y BiallelicHaplogroups.htm#routes) 


2.2 Pakistani civilization 

Pakistan is diverse country with variety of cultures and ethnic standings available in its 
more than 180 million people. Historically it has been an important trading hub since the 
ancient Indus valley and Gandhara civilizations, Pakistan has had been exposed to many 
invaders and traders from around the world i.e. Africa, Middle East, Caucasus, 
Mediterranean and China. This influx has contributed to the vibrant mix of people in 
modern day Pakistan. The majority of Pakistani population belongs to an IndoEuropean 
ethnic groups comprisingof various smaller ethnic grouping. Punjabis being the largest 
group constitute almost 44% of the population, Pathans being second largest ethnic 
group constitute 15%, Sindhi constitute 14%, Saraiki 10%, Muhajir 7%, Baluchi 

3% and the remaining 4% of the population consist of Hindkowans, Gujjars, Kashmiri, 


Potoharis, Chitralis, Hunzakut, Dards, Balti and several other small groups. 


2.3 The Pathans 


Pathans, the Pashto speaking people also spelled as Pashtuns, Pakhtuns, Pukhtuns, 
broadly referred to as Pathans or ethnic Afghans (Banuazizi et al., 1994) belong to an 
Eastern Iranian ethno-linguistic group residing for the most part of Eastern and southern 
Afghanistanand in Khyber Pakhtunkhwa Province, Federally Administered Tribal Areas 
and parts ofBaluchistan Province of Pakistan. They are characterized by Pashto language, 


their culture, honor and the pre-Islamic code of conduct Pukhtunwali they practice 
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(Oliver, 1980).Their unified modern past is sometimes referred to the establishment of 


the rise of the Durrani Empire in 1747. 


Majority of Pathan tribes live in an area stretching from western Pakistan to 
southwestern Afghanistan. Additional Pathan communities reside in the Northern 
Areas, Azad Kashmir and Sindh provinces of Pakistan as well as throughout Afghanistan 
and in the Iranian region of Khorasan. A large migrant worker community lives in the 
countries of the Arabian Peninsula and in smaller communities in Europe and North 
America. A sizable community of largely putative ancestry 

calls India home. Important metropolitan centers of Pathan culture include Mardan and 
Kandahar. In addition, Peshawar, Quetta and Kabul are ethnically mixed cities with large 
Pathan populations. With 1.5 million ethnic Pathans, Karachi stands the largest Pathan 


city in the world. 


Pathans with total population of over 15.4 percent or 25.6 million people are widely 
dispersed in Pakistan inhabiting vast area from Khyber to Bolan including Peshawar, 
Quetta and Karachi (the largest Pathan city in the world) (Shehzad, 2007).They are also 
found in vast majority in Afghanistan with the estimated population of 42% or 12.5 


million people, in Khorasan (Iran) and India (Lewis, 2009). 
2.3.1 History and origin 
The Pathan areas have faced many foreign invasions of Aryans, Medes, Persians, 


Mauryans, Scythians, Kushans, Greeks, Haphthalites, Turks, Mongols and Arabs. They 
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have an ancient history, much still unresearched. Various scholars and researchers have 


proposed different theories about the origin of Pathans. 


2.3.2 Historical references 


Herodotus, the famous Greek historian first referred to the people 
called Pactyans, inhabiting Arachosia, the eastern frontier of the Persian Satrapy in the 
1stmillennium B.C.E. (Rawlinson, 1858). Another reference about the existence of a tribe 
called Pakhtas residing in the region of Pakhat in Afghanistan, is given in the Rig-Veda, 
and some historians have hypothesized them to be the early ancestors of the Pathans. 
Some others linked Bactrians to the Pathans who used other similar Middle Iranian 


dialect. 


Historically Pathans are referred to as ethnic Afghans. Theterm Pathan was used as 
synonym of Afghan until the dawn of modern Afghanistan and the establishment of 
Durand Line, a 250 Km long border line drawn by the British Mortimer Durand in 1893. 
According to various scholars it was believed, "The term Afghan was first used in 982 
C.E." in Hudud-al-Alam (Minorsky, 1940). The Pathans consider Afghans to be their 
common renowned ancestor. According to Some historians Pathans believed to emerge 
in the region of Kandahar and the Suleiman Mountains, and started expanding 1000s of 
years ago. At the site of their origin they would used to be in close proximity with the 


Persians andMauryans and may have beenfollowing Zoroastrians, 
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Buddhism, Hinduism or Judaism before to the advent of Arab Muslims in the seventh 


century (Dupree, 1977; Holdich, T. H. 1899). 


2.3.3 Anthropology and linguistics 


The Pathans have been considered as the modern day descendants of Scythians speaking 
Pashto, a sub-branch of the Indo-Iranian branch of the Indo-European family of 
languages. Pashto language has ancient origin and has similarity with Bactrian and 
Avestan language. Pamir dialects, such as Ossetic, Shughni and Wakhi are considered 
to be its closest modern relatives. Pashto has borrowed many words from other 
languages like Sanskrit and Persian. Foreign Invaders also have left their language foot 
prints on Pashto and it has borrowed many words from Arabic, Greek and Turkic 


languages (Awde et al., 2002). 


According to Yu V. Gankovsky, the Pathans emerged as a "union of Fast-Iranian tribes 
that provided the ethnic platform for the Pathan ethno genesis, in the middle of the first 
millennium C.E. (Gankovsky, 1982). Early precursors to the Pathans had beensothe 
ancient Iranian tribes that stretched all through the eastern Iranian Plateau (Harvey, 5. 
2014). Based on the Southern or Northern Pashtodialect they speak, the Pathans either 
called Pathans or Pukhtuns respectively. Phenotypically, Pathans generally represent the 
Mediterranean people (Blood, 2001) having light hair tones and eye colors. Several 

Anthropologists documented the oral traditions about the origin of Pathans. Some 


anthropologists lend credence to the mythical oral traditions of the Pathan tribes 
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themselves. For instance, the Encyclopedia of Islam, highlighted the theory of Pathan 
descent from Israelites which could be traced from “Maghzan-e-Afghani” that was 
compiled in 17th century C.E. by a famous scholar in the court of Mughal emperor 
Jahangir, named Naimat ullah Haravi. Several other Pathan writers and historians 
followed the same theory in their writings. Same theory was also referred by Olaf Caroe 


in his writing “The Pathans”. 


According to the scriptures of sir William Johns, Afghans are considered to be one of the 
lost tribes of Bani Israel who broke away from captivity of Babylonian king Bakht Nasr 
and took refuge in the area of Asarah in Afghanistan. Armia and Barkhia were the two 
of the six sons of Sawal or Saul, a successor of Yahuda, a son of Hazrat Yaqub (AWS). 
Armia had a son Afghan and Barkhia had a son named Asif. Afghan was selected as a 
commander of an army in the kingdom of Dawood and Suleiman, after the death of 
Sawal. Later when Palestine was captured by Bakht Nasr, Afghan and Asif migrated to 
Ghor and took control of the area from the local tribes. Similarly, according to Taagati- 
Nasiri,people called Bani Israel settled in Ghor, Afghanistan, and then started migrating 
towards south and east. It is documented that in 722 BCE after the conquest of Neo- 
Assyrian empire, the descendants of Afghan and Asif who were expelled from their 
homeland Palestine and lost their way, were ten in number. One of the lost tribe moved 
towards Mecca and met an Islamic worrier and an Israelite tribes man, Khalid bin 
Waleed. He then sent a message to the Afghans settled in Ghor to embrace Islam. A 


delegation under the command of Qais moved to Madina where they met Prophet 
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Muhammad (PBUH) and embraced a new faith of Islam and Qais was given a new name 
“Abdur Rashid” by The Prophet. Later he was given a title of “Bathan” where B was 
replaced by P with the passage of time and all his descendants were called Bathans or 


Pathans. 


Those Bani Israel references are in good promise with the traditional view held by 
Pathans, that the tribe of Joseph among other Hebrew tribes inhabited this region when 
twelve tribes of Israel from twelve sons of Jacob dispersed. Bani-Israel theory stated in 
Maghzan-e-Afghani has been interrogated due to lack of historical justifications and 
linguistic inconsistencies, for examples the exiled of ten lost tribes from Assyria. Whilein 
Maghzan-e-Afghani it is referred to the ruler of Persia, authorizing those tribes to go east 
to Afghanistan (Harawi et al., 1960). That contradiction can be enlightened by Persian 
control over the Assyrian Empire’ s lands after taking control over Babylonia. Yet there 
is no record for mobility of these Israelites to further east, provided by ancient authors 
yet, that oral tradition has been widely accepted among 

Pathans. The Pathans residence in the region of Afghanistan has also been mentioned in 
Rig Veda that was probably composed before 1200 B.C.There is no evidence of Israelite 
or Jewish connection provided by ancient authors before the conversion of the Pathans 


to Islam. 


Another group of historians believe in the Aryan descent of Pathans. According to some 
historians, Pathans are considered to be of Northern European lineage. Some others 


traced their ancestry to the Southern Russia while some consider them to be of Mongolian 
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and Chinese Turkistani origin. A vast group of modern researchers consider the area of 


“Bakhtar” between Pamir and Oxus River to be their possible birth place. 


After a gradual increase in number, they started moving out of this area and scattered 
towards the valley of Swat and Indus River after crossing Hindukush. This group of 
people was called Indo-Aryana. They kept on moving towards Punjab and finally 
reached the valleys of Ganga and Jumna overthrowing the local Dravidian tribes. 
Another group from Bakhtar migrated westward towards Iran. Some historians consider 
them to be the ancestors of Pathans. The modern researches like Faster Tytler proposed 
a “Mixed Race Theory” which documented the Aryan origin of Pathans having elements 
of Mongol, Turkish and other strains. This idea was strengthened by the opinion of 
Charles Miler about the amalgamation of Greek, Mongol, Persians, Scythians and Turks 
to the afghan stock. Several vestiges and leftovers of Syriac language were found during 
mining and excavation of the Gandhara, Kandhar, Taxila and Lughman areas, providing 
strong evidence of the presence of Syriac people in the area. Later in 5 centaury this land 
was invaded by Haphthalites, “white Huns”. In Umayyad supremacy, Arabs invaded 
the area and merged with the local population. In 13" century the “yellow race” of 


Changez Khan also steeped in and got mixed up with the local Afghan population. 


Some historians connect the Pathan or Afghan lineage from Hazrat Ibrahim through his 
six sons who settled in the North West area of Iran from where they were expelled and 
inhabited “Parthia” or Pasht or Pakht. For which they were later called “Pashtuns” 


(Kakakhel, 1981). 
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These mythical oral traditions have been strengthened as a consequence of long cultural 
and political struggle between Pathans and the Mughals, elucidating the historical 
background for the formulation of the myth, the inconsistencies of the mythology, and 
the linguistic investigation that disproves any Semitic origins (Harawi et al., 1960). 
Several other Pathan tribes claim to be descended from Arabs, including some i.e. Sayyid 
or Kakakhel Mian even claim their descentfrom the Muslim 

Prophet Muhammad (PBUH). Some other Pathan groups i.e. Afridi, Khattak and Sadozai 
from Peshawar and Kandahar also claim their descent from Alexander the Great of 


Greece (Mansoor et al., 2004). 


2.3.4 Ancestral alias 

Pathan can only claim to be Pashtun if his father is Pashtun. The patrilineal definition is 
based on an important conventional law of Pashtunwali. That law has maintained the 
tradition of exclusively patriarchal tribal lineage intact. Under such definition, an ethnic 
Pathans have less concern with the language he speaks and more for his father. So, the 
Pathans who have lost both the language and seemingly many of the ways of their 
alleged ancestors can remain "Pashtun" by tracing their fathers' ethnic heritage back to 
the Pashtun tribes as they are said to be descended from Qais Abdur Rashid who has 
been considered as a possible Progenitor of Pathans (Allah et al., 2013). According to 
ancient historians, Qais, travelled to Arabian Peninsula to meet Prophet Muhammad in 


Madina and after embracing Islam returned back to Afghanistan and present day 
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Pakistan. Then got married to Khalid bin waleed” s daughter and ostensibly had many 
children and four sons Sarban, Baitan, Ghourghusht and Karlan, who commenced 
towards the east i.e. each one moving its way towards Swat, Lahore, Multan and Quetta 


respectively. 


2.4 Study Area 


The study area includes two geographically adjacent districts of Khyber Pakhtunkhwa 
(KP) province of Pakistan. i.e.Charsada and Mardan. Location map of the study area is 


provided inFig.2. 
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Fig.2. Location map of Charsada and Mardan district (Khyberpakhtunkhwa) Pakistan. 
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2.4.1 The Charsada District 


Charsada is one of the 25 districts in the Khyber Pakhtunkhwa province of Pakistan. The 
town of Charsada was once part of the Peshawar region. Vast majority of the population 


of the area is Pathan. 


2.4.1.1 Geography 


It occupies total area of 996 km? with total population of almost 10.2 million individuals 
according to 1998 census report, and 16.2 million according to 2014 report. It is situated 
at 34°03" and 34°38" north latitude and 71° 28" to 71°53" east longitude. It is bordered 
by Malakand district at its north, Mardan district at its east, Nowshera, Peshawar at south 
and federally administrated tribal areas (FATA) and Mohmand agency at its west (fig.2). 
Being geographically adjacent to Mardan district, the climate of Charsada is similar to 


that of Mardan. 


Administratively it is divided into three tehsils (Charsada, Shabgadar and Tangi) 


collectively consisting of a total of 46 Union Councils. 


2.4.1.2 History 


It is considered as one of the most ancient and historical sites of Asia characterized by 
possessing several impressive mounds in the area. Charsada remained the part of the 


Gandhara kingdom that had been a part of 7*" strappy of Achaemenid Empire around 
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516 BC until it was overthrown by Alexander the Great in the 4 century BC. The city 
was detained by Alexander” s army in 327 BC. The Indian sovereign Chandragupta 
Maurya took over the control of Gandhara after the death of Alexander the great in 323 
BC. After this, Maurya’ s grandson Ashoka became an emperor and in his reign he built 
one stupa there of 2.5 miles perimeter which was later revealed by Hieun Tsang., a 
legendry Chinese Buddhist pilgrim in 630. After the conquest of Mahmud of Ghazni and 
arrival of Islam in 1026, the name Gandhara was faded away (Muhammadzai, 2012; 

Ali, 1993). Between 250-125 BC, this area was also governed and ruled by the Bactrian 
Greeks which was later succeeded by the Indo-Greek Kingdom who ruled until 10 AD. 
Formerly, the town of Charsada was known as Pushkalavati (Lotus City). This name was 
first cited in the Hindu epic chronicle the “Ramayana”. It was based on the cultivation of 
Lotus in this area that can be witnessed by archeological records and coins found from 
this area with the picture of goddess carrying a bunch of Lotus flower 

(Marshall and Vogel, 1903: 176). According to Ramanaya, Ramachandra’ s brother 
Bharata founded two cities of Taksha and Pushkala on the names of his two sons, after 
the conquest of Indus valley (Dani, 1963). According to Vaidya, these two towns were 
named Taxila and Peukhlaoti in Greek history. This name was replaced by a Persian 
name “Hashtnagar” with the arrival of Muslims in this region in 11 century. Hasht 
meaning “eight”, so its eight main villages Prang, Rajjar, Utmanzai, Umarzai, Tangi, 
Sherpao, Turangzai and CharsadaBazaar gave its name Hashtnagar. This name was first 
mentioned in Babur" s memories of 6t" century (Babur, 1987: 55). Garrick was of the view 
that the name Hashtnagar was a combination of two words of two different languages 
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i.e. Persian and Sanskrit (Garrick, 1881-82). According to Cunningham, the name was 
derived from its original name ” Hastinagra” that was named after the ruler of this area, 
Hasti or astes (Cunningham, 1963). Later on the name evolved into Charsada when Ilyas 
Khan Muhammadzai., an Afghan conquerorsettled here in 1109 and divided the land 
among his four sons. As in Persian the share of land was called Rasad, so it became Char- 
Rasad or four shares (Gopaldas, 1889). This area had been extended over a large area, 
and the entire vicinity is covered with vast ruins. Excavation was accomplished in district 
Charsada for about two months in 1902. Bala Hisar, an archeological site was exhumed 
twice by the Sir John Marshall in 1902 and by Sir Mortimer Wheeler in 1958. 

According to Wheeler, “Bala Hisar was established by the Persians in the 6th century BC 
as a camp protecting the eastern edge of their territory" (Wheeler, 1962). Several 
interesting remains of coins and pottery ornaments were found. A series of post holes 
and ceramic remains and other archeological deposits recovered from Charsada 
excavations are dated to 1400 BC (Hussain, 1992). Subsequent epochs signify that other 
more permanent structures including stone-lined pits were also constructed at Charsada. 
Charsada people started developing an iron-working industry and started using 
ceramics between 14th to 6th century BC distinctive for this period in the gorge of 


Peshawar, Dir and Swat. 


From 6 century BC to 2"° Century CE, Charsada remained the capital of Gandhara 
Kingdom being its administrative centre. This area was ruled by many foreign invaders 


including Persians, Mauryans, Greeks (Alexander), Greco-Bactrians, Scythians, 
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Parthians, Kushans, Huns, Guptas and Turks (Gazetteer, 1997-98). 


2.4.1.3 Tribes 


80% of the population of district Charsada belongs to Muhammadzai tribe which is sub- 
divided into groups locally called "khel".These sub-tribes(khels) have the status of khans 
of town. In addition several Sayyid families also exit. Mainly are Kakakhel Mian/Sayyid 
who claims to be the descendants of Sheikh Rehamkar Hazrat Kaka Sahib the most 
venerated Pir or Spiritual Figure among the entire Pathan population in Pakhtunkhwa 
Pakistan and Afghanistan. This tribe has gained special respect and honor among other 
Pathan tribes due to their virtuous nature and pious descent and ancestry. They are 
declared as land lords for having lands of prime importance in Charsada and 


neighboring districts. 


2.4.2 The MardanDistrict 


Pir Mardan Shah was a prominent religious figure of a small area which was named after 
him. Later a vast surrounding area merged into this small territory of Mardan. 


This area was once a part of Gandhara kingdom being a fraction of Peshawar valley. 


2.4.2.1 Geography and Topography 


The district occupies total area of 1632 square kilometers at 34° 05° to 34° 32° north 
latitudes and 71" 48" to 72° 25° east longitudes. It is delimited by Swabi and Buner 
districts on east while Buner and Malakand protected area on north, Nowshera district 


on south and Charsada district and Malakand protected area on the west (fig.2). At 
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northern side the district is bounded by high mountains with highest points of Pajja or 
Sarka 2056m high and Garo or Pato at the height of 1816m above sea lea level, while there 
is a vast south western fertile plain with some low hills encumbered through it. There is 
a steep slope running through the Plain carrying rain water to the Kabul River from foot 
hills. Water bodies and streams flow from North to south direction. The important 
streams of the area are: Kalpani, Baghiari khawar, Mugam Khawar and Naranji 
khawar.Its climate is considerably hot and humid in summer with frequent dust storms 
at night. December and January are the coldest months with minimum temperature up 
to 0.5°C. According to 1998 census, its total population was recorded almost 14.6 million 


with approximately 3.01% growth rate. 


2.4.2.2 History of Mardan district 


The area comprising Mardan district is a part of the Peshawar valley, which first emerged 
on map as part of the Gandhara kingdom. After invasion of Alexander the Great, the 
mists of murkiness began to disperse. The Alexander” s armies invaded the 

Indus valley through two different routes, one via Khyber Pass and the other through 
Kunar, Swat, Bajaur and Buner in 326 B.C. After Alexander’ s exodus, Chandragupta 
took over the control of this valley from 297 to 321 B.C. Buddhism was at its full swing 
in Peshawar valley for the period of the supremacy of the Buddhist emperor Asoka, 
Chandragupta” s grandson.In the reign of emperor Mehanda after the Greeks took over 
the valley, there was a revival of Brahmanism. After this, the Scythians and Indians 


followed and kept hold of the valley till the 7 century A.D. 
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By the end of 7t century, the Afghans invaded the valley. At that time Peshawar valley 
was governed by the rulers at Lahore. The Afghans united to the Gakkhars and over 
threw the Lahore rulers and took over the control of the hill country south of the Kabul 
and west of the Indus River. The area came under command of the Sultan Sabuktgin after 
defeating the Hindu ruler of Lahore Raja Jaipal in 10 century. After this the area had 
become a rallying point of several incursions and foray to interior India, for Sultan 
Mahmud Ghaznvi, the son of Sabuktgin. Ghaznavaid era came to an end when Pathans 
of Ghor took over in 15 century. The area was then invaded by Mughal emperor Babar 
via Khyber Pass in 1505 and remained under the authority of several Mughal emperors 
up to Aurangzeb. During his government the Pathan tribes revolted and in spite of 
continuous effort of two years from 1673-1675, to re-establish his authority Mughals were 
forced to agree with the terms set by Pathans for to retain their independence.All the 
territory west of the Indus including present day Mardan District was ceded to Nadir 
Shah by Mughals in 1738. Ranjit Singh after taking control of Attock in 1814 and 
Peshawar in 1818 handed over the command to Hari Singh Nalwa and moved further to 
Lahore. Sikhs ruled this valley until 1849 and were overpowered by the British Army in 
the Second Sikh War. Major Lawrence was appointed as the first Deputy Commissioner 
of Peshawar and Peshawar was declared an administrative district under the 
Government of Punjab. At that time the present Mardan district was merged with 
Peshawar district and later in 1909 Frontier Province was came into being. Peshawar 
district was branched off into Peshawar and Mardan districts in 1937 

(Hastings, 1878). 
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2.4.2.3 Ethnicity 


Yousafzai Pathans constitute the majority of Mardan population. But in Lundkhwar 
valley of Tehsil Takhtbhai sizeable Mohmand population also exist. The origin of the 
Yousafzai tribe is traced back to two brothers Khakai and Gori. They gave their names to 
the two divisions of the tribes inhabiting the area near Kandhar. In the middle of the 13 
century, the Khakal were consequently expelled by the second group of Ghoris and they 
settled in the area near Kabul. After getting enough wealth and fame and getting increase 
in population size, they divided into three clans, the Turklays, Gigyanis and Yousafzai. 
By the end of 15* century, the Yousafzai and Gigyani tribes started moving towards the 
plains of Peshawar and ultimately overthrew Dalazaks and dispersed into Buner district. 


They finally settled into Mardan district and the area was known as “ Yousafzai Plain”. 


2.5 Tribes under Scientific investigation 


2.5.1 Muhammadzai 

The Muhammadzai also pronounced as Mohammadzai, Mohammedzai, Mohmandzai, 
Mamanzai (Murray, 1899). The Sarbani Pathantribe should not be mystified with the 
Muhammadzai of the Barakzai Durrani lineage of Afghanistan. According to Pathan 
family history, the Muhammadzai Pathans claim to be descended from Qais Abdur 
Rashid who had three sons; Sarban, Bait and Ghourghusht. His son Sarban is known to 
be the ancestor of Muhammadzai of Charsada through his son Kharshban while Afghani 
Muhammadzai are said to be descendants of Sharkhbun,the Kharshbun's brother. One of 


the three sons of Kharshban named Zamand had a son named 
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Muhammad who is the actual founder of this tribe (Caroe, 1957; Rose, 1997). 
Muhammadzai tribe is found primarily in Hashtnagar, an area in present day Charsada 
District. Initially it was said to have settled in Khorasan, but migrated to Pakistan and 
was given the Hashtnagar by theYousafzai tribe (Elphinstone, 1815). In 9t century the 
area near Qandhar named Pusheen was inhabited by the Zaman tribe. The area was later 
invaded by Tareen tribe and zaman tribe was expelled from their home land. 
Muhammadzai tribe before this expulsion started scattering from Afghanistan and 
reached Eastern province of Afghanistan Ningharhar. Where they had battle with 
Yousafzai tribe but defeated by them. Even after defeat they were not expelled from 
Yousafzai area of Ningharhar and were asked for help from Yousafzai, when they 
planned attack against Dalazaks. Muhammadzai joined Yousafzai but asked for award 
of territory of Hashtnagar which they achieved after their victory. They still retain their 
hold. Mir panda Khan, Malik Khyzer Khan and Hegi Khan were declared the tribe 
leaders. In 1526-1530, uniting with Mughal emperor Babur conquered Peshawar and 
overpowered Muhammadzai and Yousafzai. After Babur, Muhammadzai took over the 
control of the area. The geography of tribes is fundamental to their internal organization, 
because the subdivisions of the tribe and their respective villages are known by the same 


names. 
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2.5.2 Mohmand 


For centuries between the basins of the Oxus River and Tarnak River in Ghazni, there 
lived a Mohmand tribe. This tribe has purely Afghan origin and follows strict pre Islamic 
honorific code “Pashtunwali”. At the time of British invasion in Peshawar valley, the area 
comprising the present day Mohmand agency was under the influence of the local 


tribesmen and was also governed by Khan of Lalpura. 


The Mohmand had many rebellions against the foreign invasions and specifically against 
British India. Many Khels of Mohmand tribe were given assurance in 1893 from the Emir 
of Afghanistan that they will not be endured severance because of their past association 
with Afghanistan, and they were known to be Assured Clans. Mohmand living in 
Charsada and Mardan district are said to be migrated from Mohmand area of Mohmand 
Agency, which is bounded by Charsada and the plains of Peshawar on East. 

There are also several sub clans of Mohmand i.e. Khawezai, Tarakzai, Baizai and 


Halimzai which are further subdivided in several sub clans and divisions. 


The famous Sanskrit grammarian and historian Panini also referred to the name of tribe 
the Madhumants (the modern day Mohmand) in his Ashtadhyayi (Grammar of 
Sanskrit) who colonized the northwestern areas, in the 5th century BC. (Roy et al. 2007). 


2.5.3 Yousafzai 


Yousafzai tribe is considered to be the largest Pathan tribe initially originated in 
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Qandhar, the present-day Afghanistan. Arrived and settled in Kabul in 1446 while 
migrating eastwards and there they remained under the influence of Turkic governor 


Ulugh Beg. 


After further eastward movement towards Swat valley they faced rebellion with local 

Dilazak tribe. After a continuous effort of 20 years, under command of their chief Malik 
Ahmed Khan, the Yousafzai tribe along with some allied clans (Jadoon and Uthmankhel) 
ultimately succeeded to throw out Dalazaks from their home town and they were then 
pushed eastwards to the Hazara Mountains east of the Indus River, after the encounter 


of Katlang. 


In 330 BC Alexander mentioned a tribe named "Isapzais". Similar name was alsopointed 
out by Baburin 16th century. Because of their non cooperative attitude, Akbar tried to 
subdue them under the command of Raja Bir Bar. Consequently Raja Bir Bar was killed 
by Yousafzai. It was due to their infuriated nature that they could not be completely 
brought under dominion of Mughal Empire until 1690 (Richards, 1993). Pir Babawas their 
first emir. After Akbar Shah's death in 1857, Akhund Ghaffur took over the control of 


state and ruled the stated until 20th century (Haroon, 2011). 


The Yousafzai tribe is predominantly found in the districts of Swat, Shangla, Buner, 
Malakand, Mardan, Tor Ghar, Swabi, Lower Dir, Upper Dir in the Khyber 


Pakhtunkhwa province of Pakistan, besides some living in Battagram and Oghi 


(Mansehra). 
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They also inhabit areas other than Khyber Pakhtunkhwa in Pakistan e.g. a Brahui and 
Persian speaking Yousafzai clan has also been found in Mastung district Baluchistan. 

The predominant language of this tribe is Pakhto (Pashto), the northern variant of Pashto 
with the hard “kh” replacing “sh” in soft southern variant. Some Yousafzai lineages are 
found in Andhra Pradesh, Uttar Pradesh, Madhya Pradesh, Gaya, Bihar, Gujrat and 


Banglore (India). 


2.5.4 Kakakhel Mian 


The Kakakhel Mian is considered to be a prominent Sayyidclan. Their roots reach to 
Hazrat Ali bin Ismail bin Imame-Jafer Sadiq.Sayyid Kastir Gul an Islamic Sufi or wali is 
their probable ancestor. Another probable ancestor is Sheikh Rehamkar, a student of 

Sheikh Hazrat Akhund Adeen/Adyan Seljuki. The title of Rehamkar was given by 


Sayyid Abdul Wahab Akhund Panju Baba. 


Kastir Gul was affectionately called "Kaka Sahib", so his descendants are entitled 
Kakakhel. The probable place of origin of this clan is a small village named "Kakasaib" in 
Nowshera District, KhyberPakhtunkhwaProvince from where they spread throughout 
the province especially in the nearby area of Charsada district (District census report, 


1998). 


The word “Kaka” means “uncle” and “Khel” means, “sons” or “children”. So the word 
"Kakakhel" means the "Children of the Uncle". As Kakasaib was affectionately called 


“Kaka” by everyone in the village, so the name Kakakhel was given to his descendents. 
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Until late 13th century, Kakakhel Mian was least known clan because of their numerical 
scarcity beyond their borders. But they started getting prominence in 19th century by 
considerable increase in their population size. Their religious education and righteous 
nature contributed to high esteem among other Pathans. Kakakhel have been 

recognized as forefront fighters in battle against Sikhs (Official Gazette, 1870). During the 
British Raj, Kakakhel made a number of contributions to society. They proved to be 


highly competent civil contractors, soldiers, diplomats and police officers. 


Kakakhel Mian has also made contribution to the independence movement and they have 
also served Pashto and Persian language for decades. Several writers and poets of 

Pashto literature belong to Kakakhel clan of Sayyid. The intellectuals and writers from 
Kaka sahib developed a union named "Milliah Rehamkaria" at the onset of 20th century 
where they donated a collection of about 4000 books, magazines and newspapers. Sayyid 
Bahadur Shah Zafar Kakakhel, a renowned intellectual compiled a dictionary comprising 


45000 words. Members of this clan use “shah” as their surname (Gopaldas, 1874). 


2.6 Mitochondrial genome 


The length of Human mitochondrial DNA (mtDNA) is 16,569 base pairs that code for 13 
proteins involved in oxidative phosphorylation, 2 rRNA and 22 tRNA molecules (Fig.3). 
Several other proteins i.e. RNA and DNA polymerases and RNA processing enzymes are 
of nuclear origin and are imported into mitochondria via double mitochondrial 
membrane (Cummins, 1998). Due to unsymmetrical distribution of Gs and Cs in both 


strands of the circular mtDNA, one strand is considered as Heavy (H) and other is Light 
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(L). Mitochondrial DNA is a circular molecule present in multiple copies per cell ranging 
from 100 to 1000. This attribute makes it useful in the presence of small quantity of 
sequence of interest, especially in case of fossil study. Because only an egg donates its 
mitochondria to the developing embryo with a few exceptions, in all humans mtDNA is 
maternally inherited (Giles et al. 1980; Cummins, 2000). The mitochondron of sperm is 
degenerated by the enzymatic activity of ubiquitin (Thompson et al., 2003). As the 
individual harbors only one type of multiple copies mitochondrial DNA, said to be 
homoplastic but become Heteroplasmic when mutation occurs in any of the mtDNA 
copy during segregation (Cavalier et al., 2000; Alonso et al., 

2002). Unlike nuclear genome there is no recombination mechanism in 
mtDNA(Meriwetheret al. 1991). Similarly mtDNA is also lacking of repair mechanism 
that consequently leads to the much higher mutation rate than nuclear genome sequences 
(Brownet al. 1979, 1982; Saccone2000). This attribute is the basis of sequence variation 
among individuals (Olivio et al., 1983). Its high copy number renders it useful tool in 


DNA profiling for forensic investigations (Allen et al., 1998). 
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Fig.3.The16569 bp, circular human mitochondrial genome. The control region (D-loop) 
with the two hyper variable regions: HVI and HVII which are commonly used in 
Population genetic investigations (Lembering, M. 20013). 
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2.6.1 Tracing human migration from mtDNA 


A few decades back, the only possible way of tracing human migration and to get insight 
into the evolution was to expose and uncover the skeletal remains. However, several 
breakthroughs have come up in the last two decades for the anthropologists with the 
advancement in the field of molecular biology. Until late 19th century, analysis of 
biomolecules like proteins was proved to be the useful tool for evolutionary studies but 
with less accuracy. With the discovery and introduction new genetic tool and molecular 
techniques, to get deep insight into the intra and inter population variation has been 
easier for the anthropologists to formulate new evolutionary and migration theories. 
DNA whether nuclear or extra nuclear has a tendency to record set of mutations in its 
sequences taking place with the passage of time so, act as a biological calendar. The role 
of mitochondria in several biological and physiological processes has long been known 
but their role in tracking evolutionary history has recently been studied in the last three 
decades. The rate of sudden changes/ mutations accumulated over time, defines the rate 
of evolution that can proves to be a useful tool to elucidate the time and age of divergence. 
e. g. Genetic and paleontological data suggested the common lineage of humans and 
Chimpanzee and their time of divergence was nearly 5million years ago (Ingman et al., 
2000). Whole mitochondrial genome doesn’ t evolve simultaneously at same rate (Pesole 
et al., 1999). The highest rate of mutation is in non coding triple stranded stretch of 
mtDNA called D-loop or displacement loop (Wallace et al., 1995) and said to have three 
hyper variable regions, HVI, HVII and HVIII (Stoneking, 2000). This region comprises 


about 7% of mtDNA and has been studied and scrutinized in detail (Ingman et al., 2000). 
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The rate of mutation in D-loop region is 10 times higher than in coding region (Howell et 
al., 2007). On average 1-3 bp per 100bp varies among two unrelated individuals at 
random in their D-loop sequences (Piercy et al., 1993). Although there are some kinds of 
mutation which make comparisons difficult e.g. back mutation and parallel mutation. 
Similarly the presence of hot spots for mutations also renders the comparison difficult 
but sequencing of complete human mitochondrial genome has offered many 
polymorphic sites for comparison. The HVI region ranges from 16024-16365 np and HVII 
ranges from 73-340 np while HVIII ranges from 438- 574 np. To annotate the population 
differences, mtDNA sequences are compared with (rCRS) Revised Cambridge Reference 
Sequence (Andrew et al., 1999). It was discovered on the basis of Analysis of HVR1 and 
HVR2 of mtDNA that, around 500,000 years ago Human and Neanderthal split took 
place (Krings et al., 1997) and it was also hypothesized that there is no contribution of 
Neanderthal in modern human mtDNA gene pool (Krings et al., 1997, 1999; Ovchinnikov 
et al., 2000). The first phylogenetic tree for mtDNA based on Restriction Site 
Polymorphism was constructed by Cann et al in 1987 and the maternal most recent 
common ancestorwas called 

Mitochondrial Eve originated around 200,000 years ago in Africa (Cann et al., 1987). Like 
Y-STRs, being haploid markers mtDNA polymorphisms are best candidate for 
phylogenetic and population origin investigations including tracing human migrations. 
Such polymorphisms are geographically dependant as the mutations accumulated in 
mtDNA lineages during the course of time (Ingman et al., 2001). These maternally 


inherited variants constitute haplotypes. A specific set of SNPs defines a haplogroup and 
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various haplotypes belong to same haplogroup on the basis of shared haplogroup 
defining SNPs (Torroni et al., 1996). Haplogroup LO - L7 are considered to be the most 
ancient haplogroups of African origin (Quintana et al., 1999) while the haplogroup L3 is 
considered to be the first common ancestor of M and N haplogroup which later evolved 
into many variants during and after out of Africa human migration (Richards et al., 2006; 
Chen et al., 1999) A generalized sketch of human migration as reflected in mtDNA is 


provided in Fig.4. 
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Fig.4. An Overview of worldwide distribution of mtDNA haplogroups (Schriver & 
Kittles, 2004). 
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2.6.2 mtDNA Haplogroups in South Asia 


2.6.2.1 Haplogroup M 


Macrohaplogroup M, along with its sibling N, account for all non-African mtDNAs. 

This haplogroup originated during the human migration from Africa 57-75Kya 
(Chandrasekar et al., 2009). An age estimate of 49,400Ky (95% CI; 39,000-62,200 years) is 
provided by Soares et al. (2009) in south Asia and in East Asia 60,600Ky (95% CI; 
47 ,30074,300 years). Other approximations include 55-73Ky for the lineage among 
African populations (Chen et al., 1995) and 69.3 +5.4Ky among Chinese populations 
(Kong et al., 2003). Due to its significant contribution and distribution within the Indian 
population i.e. 60-70%, the haplogroup M is considered to be a south Asian lineage 
(Chandrasekar et al., 2009; Disotell, 1999), but lower frequencies of this haplogroup has 
been estimated in Central Asia and western Eurasia. It is believed to have arrived in the 
Indian subcontinent via the Southern Route migration from Africa (Disotell, 1999; 
Macaulay et al., 2005; Torroni et al., 2006; Chandrasekar et al., 2009; Kumar et al., 2009). 
Sub-group M7 is a common lineage found in East Asian populations such as Korean- 
Chinese, the Han (Beijing) of China, Mongolians, Koreans (Jin et al., 2009) and Japanese 
(Asari et al., 2007). The Hazara, Balouch and Pathan populations of Afghanistan exhibit 
15%, 13.3% and 7.1% frequencies respectively of this haplogroup (Whale et al., 2012). 
within the Indian subcontinent its Frequencies have been found ranging from 26% -64% 
among different caste and tribal populations (Kivisild et al., 1999; Quintana-Murci et al., 


2004) while the frequencies found within the Afghan populations seem to resemble 
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frequencies found elsewhere in Central Asia (Whale et al., 2012). 


2.6.2.2 Haplogroup N 


The second macrohaplogroup to have diverged from the African lineage L3 is 
haplogroup N. Its age has been estimated at 64.6 +6.8Ky (Kong et al., 2003), 61,900 YBP in 
west Eurasia (95% CI; 49,200-75,000 years), 71,200YBP in South Asia (95% CI; 55,80087,100 
years) and in East Asia, 58,200YBP (95% CI; 44,100-72,800) (Soares et al., 2009). It is 
considered to be the ancestor of many haplogroups found in Europe, Middle East, Asia 
and the Americas (among the Amerindians). The origin of this lineage occurred soon 
after or probably during the migration out of Africa, and is typically considered a 
southwest Eurasian lineage (Kivisild et al., 1999; Quintana-Murci et al., 2004; Nasidze et 
al., 2006, 2007). This haplogroup is comparatively common in western Eurasia and is also 
present in Europe. A frequency of 5.3% has been reported in eastern Crete (Martinez et 
al., 2008) while the combination of haplogroups N, I, W and X constitute approximately 
9% of the Finnish population (Hedman et al., 2007). This haplogroup is also found in the 
Near East and northeast Africa nearly ~13% in Egypt, ~10% in Israel, 

Syria and Jordan, ~5% in Iraq and ~23-44% among Iranian populations (Nasidze et al., 
2008). The Hazara population of Afghanistan exhibits a frequency of 7.5% and the Tajiks 
10.5% of this haplogroup (Whale et al., 2012). Elsewhere in Central Asia and western 
Eurasia, the frequency of N haplogroup ranges from 2.3% in the Tajiks of 

Tajikistan (Derenko et al., 2007) to 20% in the South Caspian region in Iran (Comas et al., 


2004). The greater frequencies of this haplogroup appear to occur in the more western 
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populations rather than in Central Asian or South Asian. Haplogroup M is affluent in 
South Asia, however haplogroup N appears to be lacking from the mtDNA landscape 
with frequencies of 2.6% and 2.9% in the Brahui of Baluchistan and Gujarati of 
northwestern India, and 3% in both Pakistani and Makrani populations and 7.7% within 
the Han Chinese population (Yang et al., 2011). The frequencies exhibited in Central Asia 
are similar to those found in the Hazara and Tajiks with Uzbeks harboring 7.1% 


(Quintana-Murci et al., 2004) and the Turkmen population 10% (Comas et al., 2004). 


2.6.2.3 Haplogroup R 


As a descendant of the macrohaplogroup N, the haplogroup R also diverged soon after 
the modern human migration out of Africa. Along with the macrohaplogroups M and 
N, Ris one of the founder lineages for Eurasian settlement ~60-65Kya (Torroni et al., 
2006). It has an estimated age of 59,100 years in west Eurasia (CI; 47,100-74,100 YBP), 
66,600 years in South Asia (CI; 52,600-81,000 YBP) and 54,300 years in East Asia (CI; 
41,200-67,800 YBP) (Soares et al., 2009), and 62.3 + 6.3Ky (Kong et al., 2003). This 
haplogroup is a typical west Eurasian and South Asian lineage primarily due to its early 
divergence from haplogroup N in this region and can be characterized by a Mboll site 
gain at np 12704 caused by a transition at np 12705. , The haplogroup R makes up for 
almost less than 3% of the maternal gene pool of Finnish population (Hedman et al., 2007). 
It has often been recorded in Central Asia, South Asia and western Eurasia; however its 


distribution is not uniform throughout these regions (Whales et al., 2012). 
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The Karakalpak population presents a frequency of 10% (Comas et al., 2004), while in the 
Gujarati population of northwest India it appears in 8.8% of mtDNAs and in 1.8% of 
Georgians (Quintana-Murci et al., 2004). Within the Afghan populations, the Pathans 
exhibit 28.6%, Tajiks 15.8% and the Hazara 7.5% of this haplogroup. However, the 
greatest frequency was found in the Uzbeks at a frequency of 20%, while in the south 
Caspian region, haplogroup X was found in 2.4% of Persians and 9.5% of Mazandrians. 
The presence of these three lineages within the Afghani populations and the adjacent 
populations from Iran, Central Asia and the Indian Subcontinent may be attributable to 
this region being the initial territory where haplogroups M, N and R settled following the 
human dispersal from Africa. Despite each lineage sharing similar coalescent ages, 
haplogroup M is prominent among South Asian populations, particular among those in 
southern India in Andhra Pradesh where its frequency has been recorded at 64% (Whales 
et al., 2012). While the overall frequency of 61% for this haplogroup was recorded in 


Pathan Ethnic group of Pakistan (Rakha et al., 2011). 


An updated mtDNA haplogroup phylogenetic tree(Fig.5) is being used for population 
haplogrouping and classification on the basis of origin (Van et al., 2009). 
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Fig.5. An Overview of the mtDNA haplogroups (Van et al 2009). 


2.7 The Y-Chromosome 


The human Y-chromosome is being considered as an evolutionary artifact of the X 
chromosome. Besides its ability to define gender, it has some other practical significance 
by carrying many functional genes essential for normal male development. The X and Y- 
chromosomes are considered to be the true homologues roughly around 300 million 
years ago (Lahn et al., 2001) after successive mutations especially deletions reduced the Y 
chromosome to the size of 50 Mega bases, but sequence homology between to 


chromosomes still persist up to some extent. 
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2.7.1 The Structure of the Y-Chromosome 


The smallest among all the chromosomes of human genome is the Y-Chromosome. lts 
average size is roughly around 60Mb. 24Mb of which comprises the euchromatin region 
and around 30Mb comprises the heterochromatin region, collectively they are referred to 
as the MSY (male specific region) or NRY (non-recombining region) constituting about 
95% of the Y chromosome (Butler, 2003).Its large number of short tandem repeats (STRs) 
and single nucleotide polymorphisms (SNPs) contribute to the formation of a powerful 
and informative haplotyping system that has long been used in human 


identification (Jobling et al., 2001). A sketch of the Y-chromosome is given in Fig. 6. 


Several genealogical and evolutionary studies besides Forensic DNA typing have made 
use of continuously growing reliable Y- Short Tandem Repeats (STRs) and Single 
Nucleotide Polymorphism (SNPs) markers (Butler, 2003). Almost 99.99% of the Y- 
chromosome is inherited as a single unit from father to the son in paternal fashion (Butler, 
2003). The entire chromosome remains intact and all its markers along its entire length 
remain strongly linked to each other. In this way a single chromosome acts as a single 
haplotype that can be used to trace the human migration and evolutionary events (Butler, 
2003). The majority of Y chromosome exhibits lack of recombination and as a result of 
this attribute the haplotypes passes intact generation through generation and helps to 
preserve a simple historical record of the descent and phylogeny (Helena et al., 2007). 


Unlike rest of the chromosomes mutation is the only source of variability in this case. 
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Due to small effective population size, i.e. one quarter to that of autosomes and one third 
to the X-chromosome, comparatively low sequence diversity is also expected on Y- 


chromosome than on any other (Hammer, 1995; Thomson et al., 2000). 
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Fig.6. Structure of the various regions of the human Y-chromosome (Skaletsky et al 2003). 


2.7.1.1 Pseudoautosomal regions (PARs) 


The telomeric proximities positioned at the distal part of the short arm (Yp) and long arm 
(Yq)are called Pseudoautosomal region 1 (PAR1) and Pseudoautosomal region 2 

(PAR2) respectively. The approximate length of PAR1 is 2.5Mb while PAR2 is less than 
1Mb in length (Skaletsky et al.,2003). The Y chromosome recombine with its counterpart 
the X chromosome only at these regions during meiotic division in a similar fashion to 


that of the autosomal loci (Butler, 2003). 
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2.7.1.2 Heterochromatic region 


The 30 Mb regions on (Yq) long arm of Y chromosome comprising two repeat sequences 
ie. DYZ1, DYZ2, alphoid sequences and several satellite sequences, is called 
heterochromatic region on Y chromosome (Skaletsky et al., 2003; Gusmao et al., 1999). 
These sequences are clustered tandemly near centromere and plays important role in 
spermatogenesis (Gusmao et al., 1999). In addition to this, the Y-chromosome also carries 
two DYZ1 repeat fragments each 2.1 Kb in length and two other fragments ie. the Y- 
specific (YS) and the non-Y-specific (NSY) (Gusmáo et al., 1999) each 3.4Kb in length. 
These fragments consist of a single array of pentameric satellite sequences with the repeat 


sequence of ®© TTCCA3 in 800 to 4000 copies (Gusmao et al., 1999). 


2.7.1.3 Male specific region (MSY) or non-recombining region (NRY) of Y- chromosome 
Almost 95% of the Y chromosome is responsible for sex determination in males called 
NRY (Non recombining region on Y chromosome) or MSY (Male specific region on Y 
chromosome) region. It consists of a mosaic arrangement of Heterochromatic and 
Euchromatic regions (Skaletsky et al., 2003). The euchromatic region is further divided 
into three categories, i.e. the X-Degenerative Region, the X-Transposed Region, and the 
Ampliconic Region. Ampliconic region carries the total of 156 transcription units 


(Skaletsky et al., 2003). 
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2.7.1.4 X-Degenerative Region of Y chromosome 


The 20% of the euchromatin consist of eight sequence blocks on Yp and Yq comprising 
almost 8.6Mb of sequences, called X-degenerative region on MSY. Almost 27 X-linked 
single copy genes are located on this region. Out of which 13 genes are nonfunctional 
Pseudo genes while remaining 14 are Y linked functional genes. Some of which are 


homologous to X linked genes (Skaletsky et al., 2003). 


2.7.1.5 X-Transposed Region of Y chromosome 


X-transposed region of 3.4 Mb on the short arm of Y chromosome (Yp) comprises almost 
15% of the MSY euchromatin and occurs in two sequence blocks (Skaletsky et al., 2003). 
99% of the X-transposed region sequences are homologous to the X chromosome long 
arm (Yq) sequences. This region possesses lowest gene density and highest interspersed 


repeat density and it doesn” t recombine unlike PARs (Skaletsky et al., 2003). 


2.7.1.6 Ampliconic Region on Y chromosome 


30% of the total MSY euchromatin constitutes the Ampliconic region of 10.2Mb, which 
consists of seven sequence hunks on both the short arm (Yp) and the long arm (Yq). These 
sequences possess the highest gene densities of the three classes of the MSY euchromatin 
and the lowest gene density of LINE1 and interspersed repeat elements 


(Skaletsky et al., 2003). Nine protein coding gene families on MSY have been identified. 
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Among them, XKRY, PRY, VCY and HSFY have two copies, BPY2 exists in three copies 
while DAZ and CDY exist in four copies and six copies exist of RBMY gene. All of them 


are involved in male fertility. 
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Fig.7. Protein coding genes on the Male specific region on Y chromosome with 
homologues on the X-chromosome (right side) and genes not found on the Xchromosome 
(left side) (Skaletsky et al., 2003). 


2.7.1.7 Palendromic DNA sequences 


There are eight long palendromic sequences in ampliconic region on Yq, denoted by P1- 
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P8 ranging in length from 9Kb to 1.45KB. These palindromes are separated by 2Kb to 
170Kb spacer sequences (Rozen et al., 2003). P1 is the largest palindrome of 2.9Mb. It 
consists of two secondary palindromes P1.1 and P1.2 within it, each 24Kb in length. The 
total length of all the eight palindromes is approximately 5.7Mb that is about 25% of the 


total MSY euchromatin (Skaletsky et al., 2003). 


2.7.1.8 Sex determining region on Y chromosome (SRY) 


In the euchromatic region of the MSY, there is a region adjacent to the pseudoautosomal 
region on the short arm of Y chromosome called the sex determining region (SRY). It 
actually controls the male sexual development by encoding for (TDF) the testis 


determining factor (Sinclair et al., 1990). 


2.7.1.9 Amelogenin gene (AMEL) 


The Amelogenin gene (AMELY) on the Y-chromosome and its homologue (AMELX) gene 
on the X-chromosome, plays important role in tooth bud growth by enamel production 
(amelogenisis) and dentine development (dentinogenesis) (Butler, 2005). This locus is 
primarily used in gender differentiation of male/female mixed samples which is not only 
a practical tool but of requirement of extreme importance in forensic caseworks 
particularly in cases of sexual assault, skeletal remains and old blood blots which require 


swift, sensitive, reliable and accurate means of exploration and analysis (Butler, 2005). 
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2.7.2 Polymorphisms on the Y-chromosome 


In order to trace the human migration, evolutionary history and population admixture, 
the paternally inherited Non recombining region on Y chromosome (NRY) has 
extensively been used since last two decades. This region on Y chromosome holds 
various polymorphisms with varying mutation rates that could be used as apowerful tool 
for scientists to achieve their research goals. The genetic markers on NRY region of the 
Y-chromosome are physically linked and are transmitted to the males of next generation 
i.e. in patrilineal fashion. This linkage and lack of recombinationresults in the lack of 
independent assortment of these markers and increased susceptibility to genetic drift due 
to small effective population size as compared to autosomes. Drift enhance the 
differentiation and discrimination between Y-chromosomes in different populations and 
is useful to investigate the past events in a specific population. But genetic drift influences 
the haplotype frequency in a population over time. Clusteringpattern is also influenced 
by the behavior of males bearing Y-chromosome, e.g. Patrilocality (Murdock, 1967; 


Burton, 1996). Patrilocality is mostly practiced in South Asia (Kayser et al., 2001). 


2.7.2.1 SNPs the Bi-allelic Markers 


The most abundantly occurring markers are the Bi-allelic markers which include the ALU 
insertion (YAP-DYS287) and considerable number of SNPs (single nucleotide 


polymorphisms) (Underhill et al., 1996). SNPs are SNPs are the single nucleotide change 
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in the base-pair sequence as consequence of substitution, transversion or transition. More 
than 33,000 Y-SNPs have been discovered and used so far for population 

phylogenetic studies and forensic investigations (Hallast et al., 2015). The most potential 
use of SNPs is the construction of haplogroups in population genetics and evolutionary 
studies. The most important is the appropriate and careful selection of SNPs, as different 


SNPs can define the same haplogroup (Sanchez et al., 2004). 


2.7.2.2 Alu Polymorphisms on Y chromosome (YAP) 


Allele variants are created by mutations as a consequence of insertions or deletions of 
usually one nucleotide at specific loci along the Y-chromosome. One such important 
insertion is Alu polymorphism on Y chromosome (YAP) consisting of approximately 


300bp (Hammer et al., 1994). 


2.7.2.3 Multi-allelic markers (Micro and minisatellites) 


Microsatellites are sequences with repeat units of 2 to 7 BP in length and mostly they are 
short tandem repeats (STRs) (Ellegren, 2000; Imad et al., 2014). The type of DNA sequence 
is defined by the length of core repeat and number of repeat units (Butler, 2012). They 
may be di, tri, tetra, Penta or Hexa nucleotide repeats. Penta and Hexa nucleotide repeats 
are considered as microsatellites. STRs have potential to be used to study evolution and 
migration foot prints as well as for resolution of medico legal cases 


(Walkinshaw et al., 1996). 
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Several hundreds of microsatellites or Y-STRs of potential importance in forensic 
caseworks and evolutionary studies have been identified on the Y-chromosome (Gil et 
al., 2012). Number of repeats is highly variable among individuals so are best candidates 
for population genetic studies (Budowle, 1995; Butler et al., 2009; Mohammed and Imad, 


2013). 


Similarly minisatellites are also present on Y chromosome that varies in length from 10 - 
60 base pairs (Gusmão et al., 2005). Number of repeats defines the allelenomenclature at 
specific locus. Changes occur occasionally in the number of repeats of these STRs or 
minisatellites. The most probable cause of intra allelic mutations in microsatellite is the 
replication slippage (Jeffreys et al., 1998). Other than mutation, selection is another source 
of haplotype diversity in population. Paternal lineage can be traced by tracking these 
changes (Gusmáoet al., 2005). The rate of mutation is high in minisatellites loci, 
approximately equals to 6-11% per generation, while in microsatellites it is nearly equals 
to ~0.2% per generation for Y-STRs. On the other hand the mutation rate for SNPs is 
significantly low (Dupuy et al., 2004). Multi-allelic markers have proved to be extremely 
useful in differentiating Y-chromosome haplotypes resulting in fairly high resolutions 
(Butler, 2003). Major fraction of the Y-STR loci is located on the long arm (Yq) of the Y- 
chromosome. Almost 25.3% of STRs are at Yq11.221, 16.6% of the STRs are at Yq11.222 
and almost 18.4% of Y STRS are positioned at Yq11.223 (Hansonet al., 


2006).22.1% are found on the short arm of the Y-chromosome at Yp11.2 and three loci 
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DYS716, DYS707 and DYS631 are located in the centromeric segment (Hanson et al., 2006). 
There are several advantages of using Y-STR markers over autosomal markers with a few 


limitations. 


2.7.2.4 Unique and recurrent event polymorphisms 

Y chromosome markers are classified on the basis of their mutation rates. e.g. recurrent 
and unique event polymorphism (Hurles and Jobling, 2001). SNPs, YAP and indels 
exhibit low mutation rates while microsatellites and minisatellites exhibit high mutation 
rate and generally called “unique event polymorphisms” (UEPs). These are rarely 
occurring polymorphisms that are assumed to be population specific and occurs at 
specific loci as a consequence of evolution (Hurles and Jobling, 2001). Binary 
polymorphisms of distinctive origin (haplogroups) are pooled into monophyletic 
composite haplotypes and are phylogenetically branched off from the single 
parsimonious tree. Hurles and Jobling first time estimated the human ape divergence 
time on the genetic evolutionary clock. And the TMRCA (Time to most recent common 
ancestor) of existing human Y chromosome has been estimated to be between 50,000 and 


200,000 years ago (Hurles and Jobling, 2001). 


2.7.3 Applications of Y-chromosomal polymorphism 


Several hundreds of the Y-STRs have been discovered so far are available as a traditional 
tool for forensic and population genetic investigations but only those with high genetic 


diversity and variance are proved to be more useful in such cases (Hanson et al., 2006). 
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Rapidly mutating Y-STRs have ability to differentiate and discriminate between brothers 


in more than 60% of cases (Ballantyne et al., 2012). 


2.7.3.1 Y polymorphism, a tool in Forensic investigations 

Y-chromosomal polymorphisms, primarily the STRs have been a useful discerning tool 
in forensic studies specifically in cases of sexual assault. They have been playing a 
substantial role in generation of valid genetic profiles acquired from male suspects of 


gang rapes, male /female or male/male sexual assault cases (Hanson et al., 2006). 


2.7.3.2 Paternity testing casework 


The uniparental inheritance along the patrilines makes the Y chromosome and Y-STRs 
useful for tracing the paternity of the male offspring. For this purpose the Y -DNA from 
any male relative in the lineage can be helpful (Gusmao et al., 1999). A key feature in 
forensic investigation and paternity testing is the accurate construal of genetic profiles 
considering the probability % rates of possible STR mutations that could influence the 
inclusion or exclusion of paternity of an alleged father (Kayser et al., 2001). Different sets 
of Y-STR markers are being used for this purpose. Y-STR haplotypes characterize 
information from the non-coding lineage of the Y-chromosome shared among several 
males along the paternal line. They do not provide individualization unlike that of 
autosomal STR loci and proved to be more valuable and conclusive in paternity tests of 


male subjects where autosomal and other STRs couldn” t work (Gusmao et al., 2005). 
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2.7.3.3 Implication of genealogical and Evolutionary investigations 

Several sets of minisatellites, microsatellites or STRs and biallelic markers are the best 
candidates for evolutionary and genealogical investigations due to their typing simplicity 
and high level of diversity. Network analysis of the haplotypes based on these markers 
has proved to be more informative for evolutionary and phylogenetic studies (Gusmáo 
et al., 1999). Several hundreds of SNPs and Y-STRs are being used to investigate the 
biogeographical ancestry of different populations and sub-populations (Underhill et al., 


2001; Ali et al., 2003). 


Due to very low mutation rates and lack of recombination, SNPs are in strong agreement 


for the construction of maximum parsimony tree (Jobling et al., 2001). 


2.7.4 Y-Haplogroups 


Human Y-chromosome haplogroups are defined by Y-chromosomal polymorphism 
specifically by SNPs. Each haplogroup represents the branch of Y-chromosome 
phylogenetic tree. The patrilineal most recent common ancestor of all humans is 


considered to be the “The Y-chromosomal Adam”. 


A system of naming Y DNA haplogroups had been developed by (YCC) YChromosome 
Consortium. Initially 18 Y-haplogroups were defined by YCC but later with the discovery 
of some more SNPs, their number was increased to 20, ranging from A-T. The frequency 
and occurrence of these haplogroups varies in different geographical regions and some 


haplogroups are found to be population specific as well. 
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Fig.8. World map of Y- DNA Haplogroups around the globe with possible migration 
routes.(https: / /en.wikipedia.org) 


2.7.4.1 Y-Haplogroups in South Asia 


South Asia includes present day Pakistan, India, Afghanistan, Bangladesh, Nepal, 
Bhutan, Maldives and Sri Lanka. Several Y-DNA haplogroups including C, F, G, H, J, L, 
O, P, Q, Rla, R1b, R2 and T prevailed with varying frequencies in different ethnic groups 


inhabiting South Asia. 


2.7.4.1.1 Haplogroup R 
Haplogroup R is said to be originated around 30,000 years ago in Central Asia as a branch 


of mega haplogroup P. There it divided into two branches, The Rla and R1b. 
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Rla moved towards Europe and R1b migrated towards South Asia. Rla is believed to be 
more likely instigated in South Asia and less probably in Eurasian steppes in the 
population of Kurgan culture practicing pit grave culture and horse domestication and 
the speakers of Indo European language. The high frequency of Rla haplogroup in south 


Asia is the consequence of Indo-Iranian migrations. 


2.7.4.1.2 Haplogroup Q 


The probable site of origin of this haplogroup is The South central Siberia or Altai 
Mountains around 17,000 to 31,700 years ago (Zegura et al., 2004; Sharma et al., 2007) and 
have been postulated to reach South Asia with Indo Aryan migrations. It is found in 


different populations of South Asian countries with varying frequencies. 


2.7.4.1.3 Haplogroup F 


This haplogroup has been found in south India and is considered to the parent of all other 
haplogroups (G-T) (fig.9). It prevails in almost 90% of the world's 
population.Haplogroup G, H, IJ and K are considered to be the major sub-haplogroups 
of F. The probable place of its origin is either Eurasia around 48,000 years ago (Karafet et 
al., 2008) or South Asia. According to other study it has been originated in Levant or the 
Arabian Peninsula around 50,000 years ago (Hammer et al., 2002). Its descendant 
haplogroups depict pattern of expansion and radiation from South Asia or Middle East. 
And the presence of its sub clades in Africa might be the consequence of Back migration 


from south Asia or Southwest Asia towards Africa. 
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Fig.9. Y-DNA Haplogroup F and its descendent haplogroups from G to T 
(https: / /en.wikipedia.org) 


2.7.4.1.4 Haplogroup G, H, IJ 


Haplogroup G was found to be originated approximately17,000 - 21,000 years ago in the 
Middle Fast and is prevalent in several ethnic groups of Eurasia (Passarino et al., 2002; 
Karlsson et al., 2006) while haplogroup H is found primarily in South Asia. This 
haplogroup has been originated in South Asia around 30,000-40,000 years ago. 
Haplogroup I, is of European origin before Last Glacial Maximum from its ancestor 
haplogroup IJ which was said to be migrated there from middle East around 30,000- 

40,000 years ago. It bifurcated into I1 and I2 approximately 28,000 years ago (Battaglia et 
al., 2009). Haplogroup J was derived from IJ in near east or West Asia around 42,000 years 


ago before present and it was scattered to India, Pakistan, Central Asia, Europe and 
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North Africa as a consequence of either Neolithic expansion or episodic migrations 


(Semino et al., 2004; Shou et al., 2010). 


2.7.4.1.5 Haplogroup L 


This haplogroup first descended from haplogroup K approximately 30,000 years ago. It 
is found in high frequencies in various South Asian populations including Pakistani and 
Indian populations. Three sub branches of this haplogroup: L1, L2 and L3, have been 
found in Irani and Pakistani populations while only L1 has been found in Indian 


population (Sangupta et al., 2006). 


2.7.4.1.6 Haplogroup O 


This haplogroup was originated in East Asia approximately 35,000 to 40,000 years ago 
and expanded towards South East Asian populations (Yan et al., 2011). Small percentage 


of this haplogroup is found in South Asian populations. 


2.7.4.1.7 Haplogroup T 


Previously it was known as K2 (Mendez et al., 2011). It is said to be originated between 
Western and South Western Eurasian plate i.e. between Himalayas and Germania 
approximately 40,000 years ago (Hallast et al., 2015). According to several other 
investigations, K2 or T has been originated in Asia and later migrated to North Africa 
(Underhill et al., 2001, Cruciani et al., 2002; Samino et al., 2002; Luis et al., 2004). It is present 


in low frequency in South Asian populations. 
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The area under investigation includes the major area of Peshawar valley and is the best 
representation of Pathan population of Pakistan. The area is unexplored land with 
respect to human biology, particularly molecular anthropology. Hence the PhD research 
project presented here was started to establish a base line for molecular anthropology 


with the objectives highlights given in the introduction section. 
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Chapter3 


MATERIALS AND METHODS 

3.1 Materials and Methods 

Buccal swab samples were obtained from 374 unrelated healthy male donors. 75 each 
from Muhammadzai, Mohmand, Kakakhel Mian populations from Charsada District, 
and Yousafzai and Mohmand (74 sample) from Mardan District. All the unrelated donors 
who voluntarily donated their samples after properly signing the informed consent were 
included in this study. For obtaining the oral swab volunteers were asked to rinse their 
mouth before giving sample, in order to minimize the chance of contamination. Each 
individual was provided a collection cup with having 3mL of 5% sucrose solution. The 
purpose of sucrose was to induce the production of Ptyalin carrying saliva, in the buccal 
cavity. Ultimately sucrose was digested and loose cheek epithelial cells were mixed with 
the saliva after rinsing their mouth with the solution for 2-2.30 minutes. The resulting 
solution was collected in the collection cup. Samples were then stored at -20 *C before 


further processing. 


3.2 DNA Extraction 

Genomic DNA was isolated from buccal swab by using phenol chloroform method as 
described by Marisi and Sergio (2007). 

The solutions and reagents were prepared as protocol given in Appendix: II. 

Samples were transferred to the labeled 1.5mL eppendorf tubes. These tubes were then 


centrifuged at 7000 rpm for 5 minutes to pellet the buccal cells and debris. The 
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supernatant was poured off immediately to avoid pellet slippage. This step was repeated 
twice or thrice to get maximum quantity of peletted cells. 

In the next step, 300uL of cell lysis solution [10mM Tris, 0.5% SDS, 5mM EDTA (PH 8.0)] 
and 3 to 4 uL of Proteinase K (20 mg/ml) was added to each tube containing cell pellet. 
The mixture was vortexed at high speed for a few seconds to get the palette completely 
dissolved in the lysis buffer, and tubes were then incubated at 65°C for one hour. After 
incubation, 500 uL solution of Phenol chloroform in the ratio of 1:1 was added to each 
tube and was centrifuged at 10,000 rpm for 15 minutes. After that supernatant was 
transferred to the new eppendorf tube and 500 uL of Isopropanol (2propanol) was added 
and tubes were incubated for 10 minutes at -20°C. After that tubes were centrifuged at 
12000rpm for 10 minutes and supernatant was discarded. 50 uL of 

70% Ethanol was added to each tube and was centrifuged at 7000rpm for 5 minutes. This 
process was repeated after discarding the supernatant. Finally after palette washing with 
ethanol, tubes were kept inverted on clean absorbent paper overnight to let the palette 
dry. DNA palette was re-suspended in 50ul of TE buffer [12mM Tris (pH8) and 1mM 
EDTA] and was incubated for 10 minutes at 56°C to completely 

dissolve the palette. 

3.3 Gel Electrophoresis 

The isolated DNA from all the samples was run on agarose gel for qualitative analysis. 
1% agarose gel was prepared by adding 1g of agarose in 100mL of TAE- buffer. The 
mixture was then heated in the microwave oven until it boiled. Solution was then cooled 


to 40°C and 12 uL of Ethidium bromide was added to it. The solution was then poured 
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into the gel cassette with the combs suspended in it to create wells and was kept at room 
temperature until solidified. After solidification, gel was transferred to the gel tank 
containing running buffer (TAE- buffer). 5 uL of sample with 2 uL of loading dye was 
loaded to each well. It was then run for 20 minutes at 80 Volts. The gel bands were 


visualized under the UV light to confirm the presence of genomic DNA in the samples 


(fig.10). 
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Fig.10. Agarose gel electrophoresis picture of the isolated Genomic DNA. 
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3.4 DNA Quantification 

The DNA in each sample was quantified using Nano drop spectrophotometer before PCR 
amplification. And for those samples having highly concentrated DNA, serial dilutions 
were prepared before amplification. 


3.5 Amplification of AMELY locus 


All samples were amplified for AMELY locus confirmation before Y-STR amplification. 
25uL of reaction mix was prepared for each 3 uL of template DNA (table.1) and all the 


samples were amplified using the thermo cycling conditions given in fig.11 


Table:1. Components of AMELY PCR amplification. 


me EI 
e 


3.5.1 Thermo cycling parameters of the PCR 





Thermal cycler was programmed with the following conditions as summarized in 


Fig.11. 
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i. Initial hot start at 94°C for 2 minutes ii. 
Denaturation 94°C for minute (35 
cycles) iii. Annealing at 60°C for 1 minute 

iv. Extension at 72°C for 5 minutes 

v. Final extension was performed at 72°C for 5 minutes 
vi. Storage soak indefinitely at 4°C 
{35 cycles}. 77 


{1 cycle} {1 cycle} 











2 minutes 1minute 


1 minute 


1 minute 


Fig.11. Thermo cycling conditions of AMELY amplification. 


3.6 Polyacryleamide gel electrophoresis (PAGE) 


All the PCR products were run on Polyacryleamide gel to confirm the efficiency of PCR 


amplification. Polyacryleamide gel was prepared by adding 4.8mL dH20, 0.65mL 10X 


TBE, 975 uL of 40% Acryl mix, 16uL of 25% APS and 16uL of TEMED. 


All the samples except two were found AMELY positive for further STR amplification 


with no risk of deletion of EMELY locus along with four important Y-STR loci Le. 
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DYS570, DYS456, DYS576 and DYS481. A representation picture of PAGE results is 


provided in Fig.12. 
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Fig.12. Polyacryleamide Gel Electrophoresis photograph of AMELY PCR product. LD: 
Ladder and MC: Male control 


3.7 Y-STR amplification 


Total of 374 samples from five different Pathan populations of Charsada and Mardan 
district were amplified for 23 Y chromosomal loci including (DYS576, DYS3891, DYS448, 
DYS389II, DYS19, DYS391, DYS481, DYS549, DYS533, DYS438, DYS437, DYS570, 


DYS635, DYS390, DYS439, DYS392, DYS643, DYS393, DYS458, DYS385ab, DYS456, 
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GATAH4 ) using Promega PowerPlex® Y23 System (PPY23, Promega Corporation, 


Madison, WI). Relative positions of 23 STRs on Y chromosome are provided in Fig.13. 
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Fig.13. Relative positions of the 23 Y-STR loci used in the Promega PowerPlex® Y23 


System. 


3.7.1 Procedure 


All the Pre-amplification components were thawed just prior to use. All the tubes were 


centrifuged and vortexed for 15 seconds before each use. Number of reactions was 


determined, including positive and negative control and 1 or 2 reactions were added to 


that number. PCR amplification mix was prepared by combining all the components of 


the kit using protocol established by Promega PowerPlex® Y23 System given in the 


table.2. 
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PCR Amplification mix was vortexed for 5-10 seconds and after that 9.4 uL of PCR was 
transferred to each labeled 0.2mL tubes. To each tube containing PCR amplification 
Mix, 1.5 uL of 0.5ng/ uL DNA template was added. 


Table: 2. Components of Promega PowerPlex® Y23 amplification system. 





3.7.2 Thermo cycling parameters of the PCR 


Thermal cycler was programmed with the following conditions (Fig.14). 
i. Initial hot start at 96°C for 2 minutes 
ii. Denaturation 94°C for 10 seconds (28 cycles) 
iii. Annealing at 61°C for 1 minute iv. Extension at 72°C for 30 seconds 
v. Final extension was performed at 60°C for 20 minutes 


vi. Storage soak indefinitely at 4°C 
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{1 cycle} -—T_—=>=« {40 cycles} «___ {1 cycle} Hold 
y y y 


2 minutes 10 sec 


(e) 


61°C 30 sec 





1 min 20 min 


Fig.14. Thermal cycling protocol for PPY?3 System using Eppendorf mastercycler ep 


Gradient S thermal cycler. 


3.8. Polyacryleamide Gel Electrophoresis 


The next step was to check to make sure the PCR worked, before moving on to the next 


step. For this, Polyacryleamide gel electrophoresis (PAGE) was used. (Fig.15) 
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Fig.15. Polyacryleamide gel electrophoresis photograph of 23 Y-STR loci amplified with 
PPY2 System. 


3.9 Capillary Electrophoresis 
After confirmation, capillary electrophoresis was carried out for fragment analysis using 
Applied Biosystem 3730 Genetic analyzer. 


3.9.1 Sample Preparation 
Sample preparation for ABI 3730 Genetic Analyzer was carried out initially by diluting 


each of the 5 dye matrix standard (Flourescein, JOE, TMR-ET, CXR-ET and CC5) in the 
ratio of 1:10 with Nuclease free water after thawing them. After this Fragment mix was 
prepared. 

15 uL of each of the 5 dye matrix standards was taken from the initial dilution and was 
mixed with 1425 uL of Hi-Di™ formamide. 96 wells were used on 96 capillaries for matrix 
detection on ABI PRIMSM® 3730 Genetic Analyzer. 25 uL of fragment mix was loaded to 


each well containing 1 uL of sample in each well and 1 uL of PowerPlex® Y23 allelic 
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ladder mix to one of the wells, and plate was centrifuged for a few seconds to remove 
bubbles. 

PCR product was denatured at 95°C for 3 minutes, and then was immediately chilled on 
crushed ice for 3 minutes, just before loading to the instrument. 


3.9.2 Reading Alleles calls 


The peak sizes and positions of the amplified alleles were compared with the PowerPlex® 
Y23 allelic ladder using Gene Mapper® Software 5 after capillary 
electrophoresis. The range of different alleles for 23 Y-STR loci is given in the fig.22. 


3.10 Comparative populations 


To investigate the genetic diversity of five major Pathan tribes of Charsada and 

Mardan districts and to compare with the other Pathan populations in the neighboring 
area of Mohmand agency and Swat valley and populations from neighboring 

countries, their Y chromosome genotype data was exploited. 

For Y STR analysis, the comparative population data used was obtained from the 
published literature of Afghanistan[Pathan] (Lacau et al., 2012), Iraq[Iragi] (Purps et al., 
2014), Iran [Arabs, Bakhtiari, Galaki, Mazandarani, SouthernTalysh] (Rower et al. 2009), 
Turkey [Dogukoy, Eskikoy, Gocmenkoy, Merkez] (Alakoc et al., 2010), Greece [Greeks] 
(Purps et al., 2014), Israel [Muslim Arabs] (Farnandes et al., 2011), Russian Federation 
[Archangelskaja, Brianskaja, Ivanoskaja, Lipezkaka, Ryasankaja, Smolenjkaja, 
Tamboskaja, Tverskaja, Wologodskaja] (Rower et al., 2008), India [Tamil] (Balamurugan 


et al., 2010), Xinjiang China [Uighur, Kazakh] (Shan et al., 2014), Tibet China [Tibetan] 
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(Ye et al., 2015), Pakistani Pathans (Lee et al., 2014) and Yousafzai Pathans (KP) Pakistan 


(Ilyas et al., 2012). 


3.11 Mitochondrial DNA amplification 
Out of 374 samples 165 samples from all five populations were amplified with 15971F 


(TTAACTCCACCATTAGCACO) and 484R (TGAGATTAGTAGTATGGGAG) primers 
and the sequence length of 1082bps was amplified using 25 uL total reaction volume. The 
amplification was carried out using” Eppendorf Mastercycler ep Gradient S thermal 
cycler” of Applied Biosystem (Fig.16). 


Recipe is given in the table: 3. 


Table: 3. Reagents used in PCR reaction mixture 


: 


(5 uM) F Primer 1 uL 


(5 uM) R Primer 1 uL 


Platinum Taq polymerase (5U/ uL) 
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(1 cycle) A 35 cycles} TT {1 cycle} Hold 
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O O 


54°C 30sec | 10 minutes 


30 sec 


Fig.16.Thermo cycling conditions for mtDNA amplification 


3.12 PCR product clean-up 
5uL of the PCR product was mixed with 2 uL of Exo-SAP-it in minitubes (SAP: Shrimp 


Alkaline Phosphatase and EXO: Exonuclease I) (Malhi et al., 2010). After spinning, tubes 
were incubated at 37°C for 30 minutes followed by heat inactivation at 80°C for 15 


minutes and were held at 25°C (Fig.17). 
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15 minutes 


30 minutes 


Fig.17. Thermo cycling conditions for EXO- SAP. 


The purpose of this procedure was to chew up excess primers and removal of excess 


dNTPs from the PCR product, to make the DNA sequences readable. 


3.13. Agarose Gel Electrophoresis 


In order to visualize the PCR product to make sure that it worked and there were no 
multiple bands, the PCR product was electrophorased in 2% agarose gel. In order to 
prepare 2% agarose gel, 3g of agarose was added to 150mL of 1X TBE buffer. Mixture 
was boiled in microwave oven for 1.30 minutes at the interval of 30sec in order to avoid 
spillage. Solution was then cooled up to 45°C. After sufficient cooling Ethidium bromide 
was added in to it and solution was poured in to the gel cassette with the combs fitted 
into it to create wells. After solidification, combs were removed and 5 uL of each PCR 
product was mixed with 2 uL of loading dye. All the samples were loaded in the separate 
well. 2-Log DNA ladder (0.1 - 10.0kb) was loaded in the first well for comparison. 


Representation of Gel results is given in Fig.18. 
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Fig.18. Agarose Gel Electrophoresis photograph of amplified mtDNA. MC: Male control, 
NC: Negative control 


3.14 mtDNA Sequencing 
For mitochondrial DNA Hyper variable region I (HVI) sequencing, the 2 uL of good 


quality clean PCR product was mixed with 1 uL of 5 uM of 15971 F primer.Sequencing 
was carried out using Big Dye™ terminator cycle sequencing kit of Applied Biosystem 
on Genetic analyzer 3730. 


3.14.1 Sequence reading 


After retrieval, Sequences were read using the program Sequencher 4.7 and all sequence 
data was aligned with revised Cambridge reference sequence (rCRS) (rCRS; Andrews et 
al., 1999) using bioedit program. Variable positions were determined from np 16024 - 
16365 and haplogrup was assigned to each sample using phylotree build 16 (Van Oven 


M.2009) and mitomaster (Lott et al., 2013). 
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To investigate the genetic diversity of five major Pathan tribes of Charsada and 
Mardan districts and to compare with the other Pathan populations in the neighboring 
area of Mohmand agency and Swat valley and populations from neighboring 


countries, their mtDNA HVI sequence data was exploited. 


3.15 Comparative populations 

For mtDNA analysis the comparative data from populations of India [Afridi, Tamil] 
(Metspalu et al., 2004), Israel [Druze, Bedauin] (Hunley et al., 2009), Greece [Greeks] 
(Irwin et al., 2008), Russian Federation [Yakut] (Hunley et al., 2009), Iran [Armenian, 
Azeris, Persian, Qashqis] (Derenko et al., 2013), Iraq [Iraqi] (Abu et al.,2008), China 
[Uighur] (Hunley et al., 2009), Afghanistan, Turkey [Turkish] (Comas et al., 1996), 
indigenous populations from Pakistan [Pathan, Kalash, Brusho, Baluch, Brahui] (Hunley 


et al., 2009) was used. 


3.16 Statistical analysis 

Genetic diversity estimates were calculated for all the five Pathan populations of the two 
areas separately as well as collectively using a formula [n(1-) pi*)/n-1], where n is the 
number of samples and pi is the frequency of the i haplotype in the population 

(Nei. 1987). Genetic distances (Matrix of Slatkin linearized Fsr) were obtained by using 
ARLEQUIN Ver 3.5.2.2 software for our data and for other global populations. Of all 
pairwise Fsr values a distance matrix was constructed that was later used to construct a 


PCoA plot using GENALEX program (Peakall et al., 2006). Median Joining (MJ) networks 
were constructed using NETWORK 4.6.1.3 (http:/ /www.fluxus- 
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engineering.com) by giving each locus a weight according to its estimated mutation rate 
(Keyser et al., 2003). Mitomaster and phylotree were used to predict the mitochondrial 
haplogroups (http://www.mitomap.org) (Lott et al., 2013) and Y-haplogroups were 
identified by using server haplogroup predictor (www.hprg.com).For STR analysis, the 
haplotypes were reduced to 17 STR loci (DYS19, DYS385a/b, DYS3891/IL, DYS390, 
DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, 
Y-GATA-H4) for comparison with published available global populations data 

sets. 

3.17 Mantel test 

At the end Mantel test were performed using GENALEX program (Peakall et al., 2006) to 
measure the degree of correlation between mtDNA based Genetic distance among 
population and geographical distance, between Y-STR based Genetic distance and 


Geographic distance and between mtDNA based and Y-STR based genetic distances. 


Chapter4 
RESULTS 


This chapter includes Y-STR and mtDNA analysis results profiles consisting of Genetic 
Diversity parameters for Y-STR and mtDNA HVI data sets, Haplogroup analysis, 
Network analysis, Analysis of Molecular Variance, Principal Coordinate analysis and 


Mantel Test. 
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4.1 The Y-STRs analysis 

Analyses of the results obtained from Y-STR DNA samples obtained from 374 volunteers 
of the districts Mardan and Charsada segregated into 344 haplotypes wherein 374 
samples were belonging only to 11 haplogroups (Annexure I), with an overall 


Discrimination Capacity (DC) of 92%. 


4.2 Genetic Diversity 

The genetic diversity (GD) obtained for all the five Pathan populations was 0.99 which is 
similar to the already studied Pathans from Pakistan (Lee et al., 2014). Genetic diversity 
estimatesand other population genetics parameters for different tribes are given in the 
table 4.Comparison of genetic diversity estimates among various indigenous Pakistani 


populations is given in table 5. 
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Table: 4. Population genetics parameters for 23 Y-STRs in five Pathan populations of Charsada and Mardan district. 


CHARSADA MARDAN 
Parameters : 
(MZ) (MM) (KM) (YS) (MD) All 5 populations 


Number of samples(n) 75 75 75 75 74 374 
No. of Haplotypes 72 71 61 73 67 344 
Unique Haplotypes 70 67 51 71 62 320 
Shared Haplotype 2 4 10 2 5 24 
Random Match 0.0136 0.014 0.019 0.0133 0.0158 0.00305 
Probability (RMP) 
Power of Discrimination 0.986 0.985 0.980 0.986 0.984 0.996 
(PD) 

Genetic Diversity(GD) 0.999 0.998 0.994 0.999 0.997 0.998 


Note: The abbreviations MZ, KM, YS, MM and MD denote the tribes Muhammadzai, Kakakhel Mian, Yousafzai and 
Mohmand respectively. 
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Table: 5. Comparison of Y-STR Genetic diversity among various ethnic groups of Pakistan 


Parameters 


No. of 
Sample 


No. of 
haplotypes 


Genetic 
Diversity 


Path 
(present 
study) 


374 


344 


0.998 


Mkr 
(Qamar et 
al. 2002) 


33 


30 


0.992 


Pth 
(Lee et 
al. 2014) 


230 


211 


0.997 


Bal 


(Qamar 
et al. 
2002) 


59 


48 


0.988 


Brh Haz Bsh 
(Qamar et (Qamaret (Qamar et 
al. 2002) al. 2002) al. 2002) 
110 23 94 
85 11 63 
0.973 0.893 0.987 


Ksh 


(Qamar 
et al. 
2002) 


44 


26 


0.952 


Par 


(Qamar 
et al. 
2002) 


90 


60 


0.974 


Note: The abbreviations Path, Mkr, Pth, Bal, Brh, Haz, Bsh, Ksh, Par, Snd are Pathan from Charsada and Mardan, 
Makrani, Pathan, Baluch, Brahui, Hazara, Barusho, Kalash, Parsi and Sindhi, respectively 


Snd 


(Qamar 
et al. 
2002) 
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0.995 
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4.3 Y-Haplogroups found 


Analysis of haplogroups segregated from the DNA samples revealed that around 49.2% 


of the individuals belonged to one of three major Haplogroups i.e Rla, 17.9% from L 


haplogroup and 9.6% from G2a haplogroup. A relative incident of the haplogroups is 


presented in the table. 6. 


Table: 6. Frequency of Y-haplogroups found in five populations under study and Pathan 


population from Southern Afghanistan. 


MZ KM MM MD 
n=75 (This | n=75 (This | n=75 (This = n=74 (This 
study) study) study) | study) 


| Frequency of Haplogroups (%) 


13.3 


Elbla 


dee a E 
ie e E 
€ e i è | 
cc “O (O O E. 
w e [far a Di 


w h w CA a CE 





Southern 
Afghanian 
(N=145) 
(Sangupta et 
al. 2007) 


(Note: The abbreviations MZ, KM, MM, YS, MD denotes the tribes Muhammadzai, Kakakhel 
Mian, Mohmand from Charsada, Yousafzai and Mohmand from Mardan, respectively). 


Haplogroups Q, H, J2b, J1, J2a, R1b, Elbla and E1b1b were also present comprising 7.2%, 


4.2%, 1.87%, 3.2%, 1.33% and 1.06% respectively of the entire sample set investigated. Of 


the haplotypes generated, 7% of all were shared among populations and almost 93% of 


the haplotypes were unique to the individuals. 
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Muhammadzai (MZ), Kakakhel Mian (KM) and Mohmand (MM) samples came from 
Charsada area and Mohmand (MD) and Yousafzai (YS) from Mardan area. Total of 7% 


of the haplotypes were shared among all the five populations of the two areas. 


4.4 Network analysis of Y Haplogroups 


The network analysis of the haplogroups showed that the Haplotype ,a” is the central 
haplotype with overall frequency of 26.1% present in 3.8% of MZ haplogroup Rla 
haplotypes, 5.4% of MM Hg Rla haplotypes, 7.1% in KM Rla haplotypes, 6% in MD Rla 
haplotypes and 3.8% in YS Rla haplotypes. Similarly haplotype bis the second haplotype 
shared among all the five populations with the frequency of 6.5% in MM, 1.6% in MZ 
and MD and 0.5% in KM Rla haplotypes. Little haplotype sharing was observed in 
haplogroup H and G2a (Fig.19 a). 

No common or shared haplotype was observed among the samples of either of the 
population in Haplogroup H while single haplotype was shared between the individuals 
of Mohmand from Charsada (MM), Kakakhel Mian (KM) and 


Muhammadzai (MZ) in Haplogroup G2a (Fig.19 b, c). 
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Fig.19.a. Median-Joining network of Y-haplogroup Rla in five populations of Charsada 
and Mardan. Areas of circles are proportional to the haplotype frequencies. Haplotype 
“a” and “b” are most frequent haplotypes shared among all the five populations. 
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Fig.19.b. Median-Joining network of Y-haplogroup L in five populations of Charsada and 
Mardan. Areas of circles are proportional to the haplotype frequencies. 
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Fig.19.c. Median-Joining network of Y-haplogroup G2a in five populations of Charsada 
and Mardan. Areas of circles are proportional to the haplotype frequencies. 
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4.5 Haplogroup distribution among the population of two districts 
Charsada populations had a few additional haplogroups not found in Mardan 
populations. More diverse haplogroups points out the different history of this location 


(Fig.20). 





Nowshera 


Fig.20. Y-Haplogroup distribution in five Pathan populations of Charsada and Mardan 
district. 
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4.6 Genetic distance among five populations 


No significant genetic distance (Fsr) was found between the pair of populations from both 
the districts. The highest genetic distance of 0.04 between Kakakhel Mian (KM) and 
Mohmand (MD) was observed followed by 0.03 between Kakakhel Mian(KM) and 
Muhammadzai (MZ), 0.02 between Mohmand (MD) from Mardan and Muhammadzai 

(MZ) while no distance was observed between Muhammadzai, Mohmand (MM) from 


Charsada and Yousafzai (YS) Fig.21. 


Matrix of pairwise Fey 


ME — 
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us 
Kh 
MM 
Y 
MO 


Fig.21. Y-STR based Pair wise Genetic distances (Fsr) between five Pathan population 
pairs of Charsada and Mardan district. (The abbreviations MZ, KM, MM, YS, MD denote 
the tribes Muhammadzai, Kakakhel Mian, Mohmand from Charsada, Yousafzai and Mohmand 
from Mardan, respectively). 
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4.7Analysis of Molecular Variance among population of two districts 


AMOVA analysis for Y-STR haplotypes was used to examine the partitioning of genetic 
variation for these five populations of two districts divided into two groups. In this case 
98.20% of variation found “within the populations” while -0.6% of variation was found 
“among the groups” and 2.46% “among population within the group” (table: 7). 


Negative values indicate zero value or no variation between the groups. 


Table: 7. AMOVA design and results (average over 23 Y-STR loci) in five populations of 
two districts (Groups). 


Source of Variation Sum of Squares Variance Percentage 
pm pm 
Among groups 13.379 | 04132 -0. ae | 


Among 
populations 62.417 0.18488 2.54624 
within groups 


Within populations 2601.665 7.11737 98.02285 
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4.8 Allele Frequencies in all the five Populations 


Number of alleles at 23 different loci in all the five populations is graphically represented in 


fig.22. 
Number of alleles at different loci 
MM KM MZ 





o 


ja aa 
t 3 5 7 3 sti tu 8 19139227 tè 3 3 T 8 tl 1915 7 19AN_B t > S 7 9 t 13 15 17 19 21 23 
Locus Locus Locus 
Ys MO 
O o 


6 


ti 


Number of afeles 





137.6 79400 1 1.20. 


TEST e wot oy 19 21 23 
Locus Locus 


Fig.22. Graphical representation of allele frequencies of 23 loci in five populations of 
Charsada and Mardan district. 

(The abbreviations MZ, KM, MM, YS and MD denote the tribes Muhammadzai, Kakakhel 

Mian, Mohmand from Charsada, Yousafzai and Mohmand from Mardan, respectively). 
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4.8 
.1 Yousafzai: YS (Mardan) 


Allele frequencies of 23 Y-STR loci in Yousafzai population of Mardan district are given 
in table.8. Where the most common allele found was DYS456 (15) with the frequency of 
0.720 while the least occurring alleles found in this population were DYS576 (10, 15), 
DYS19 (13), DYS389II (33), DYS481 (21, 27), DYS549 (15), DYS635 (20), DYS390 (20), 
DYS643 (15), DYS393 (15), DYS385b (20, 21) and GATA H4 (10, 14) with the frequency of 
0.0133. 

The most polymorphic locus in this population was DYS385b with total of 9 alleles found 


and the least polymorphic was DYS19 with two alleles found at this locus. 
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Table: 8. Allele Frequencies of 23 Y-STRs in Yousafzai (Ys) population of Mardan district. 
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0.0667 


0.0267 


0.5200 


0.2133 


0.0800 


0.0400 





0.0267 


0.0933 


0.2400 


0.4400 


0.1867 


0.0267 








0.7715 | 0.5254 | 0.5716 | 0.7297 | 0.7151 | 0.4995 | 0.7481 | 0.6123 | 0.5121 | 0.5157 | 0.4926 | 0.8159 | 0.6782 | 0.7139 | 0.6681 | 0.4760 | 0.5768 | 0.4739 | 0.7485 | 0.6058 | 0.6941 | 0.4440 0.5438 
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4.8 
.2 Mohmand: MD (Mardan) 


In Mohmand(MD) population of Mardan District, the most frequently occurring allele 
was DYS533 (12) with the highest frequency of 0.6301 while the least occurring alleles 
were DYS576 (21), DYS389II (27, 32), DYS19 (13), DYS549 (15), DYS635 (19, 20), DYS392 
(15), DYS643 (14), DYS458 (20, 22), DYS385b (20, 22), GATA HA4 (10) with the frequency of 
0.0137. The most polymorphic locus among all 23 loci in this population was found 


DYS635 with total 9 alleles(Table: 9). 
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Table: 9. Allele Frequencies of 23 Y-STRs in Mohmand (MD) population of Mardan 
district. 
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4.8 
.3 Muhammadzai: MZ (Charsada) 


Allele frequencies of all 23 Y-STRs in Muhammadzai (MZ) population are given in 
table: 10. 

Most common alleles identified in this population were DYS392 (11), DYS456 (15) with 
the frequency of 0.7067. The lowest frequency of alleles found among all the 23 alleles 
was 0.0133 for DYS389I (10), DYS448 (18, 22), DYS391 (12), DYS481 (18, 27, 28), DYS437 
(17), DYS570 (11, 14, 22), DYS635 (19), DYS390 (26), DYS439 (14), DYS392 (15), DYS643 (8, 
9, 13), DYS393 (11), DYS385b (13, 21), DYS456 (13), GATAH4 (10, 14). The most 
polymorphic among all the loci was DYS481 with 10 alleles ranging from allele number 


18 to allele numer 28. 
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Table: 
10. Allele Frequencies of 23 Y STRs in Muhammadzai (MZ) population of Charsada district. 





































































Drs Drs Dyrs Drs Drs  YGATA 
| 


Alleles 
{o | | {o {o asp op o | LL {LL UL 
s | | | + | pop + | esi | | | | | | 

ARO N O O IA IO IO CO O O O O E r O 
ue | (ini O Tool | O E O O O O O || [1 {tm 
_u | | | | | fowf [omw | zis | oso | (gui | foss | mae | onos | mm] (oss) | | oso 
cose, RE E a noe 
a |ooso foro] | [oso] | fos] | [oss | ssa | [mmmiomo] [oso | 007 | 1067 | oase | osso | pa 
is fosso] | | [oss] | | | | Jowsrfeoss| | | (mm) | 0.4000 00800 | pi | 
O w [owl | | fous} | | [| | hæl | | [| | | | ors | ooo | amo fonm| 
ar |oose] | sss e 2000 || nas | onze | 
1 em mul _ A ww. 

03067 | Joss] | J Joss] | [| | [ons| gg | | | | to | [ome | | 

aie? | Josa) | | | | J | f Josj) | | | | | | [ose] | 
a foo | [oss] | | fooss| [| | mm | tas | oss | | [| | [| [msi S 
e |) | mf vo | | | [| | goo fons | | | |  t | | | | 
sao | J | + | | [esse] | | | |  Jfoswless| | | | J S | | | 
aL LL A A (i ee ee d_ 
S S S lar | | | | lesse] tt | | | | | | | 
la | J | | | | [esse] | | | | | Ææ) | |) |) |) |) | | op 
mp J | æ | o | | | op ooo | oy to? o] Y | | 
gg | | | O MES AO II O IO O O AO O IO O O IO {| | | | | 
—e A | | 1 | 1 || | | 2 
a | |) | loss | | | | | tt — + tot | | | to | | | | | 
a | | | [omo] | | | | | 94 tp | | | | | 
ea | | | melo poo otto tooo A | top tot | o 
—  — LL _ SS ____.. AAA è«RLb 
o | o.8200 | oss | ossis | 0.6760 | osio | 0.5708 | osoos | 05153 | 05269 | asz | 05171 | 07586 | 027301 | 07564 | 0.6476 | oases | 0.5708 | 015106 | 07568 | 07054 | 07478 | oaser | asst 


121 





Table: - 
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4.8 
A Kakakhel Mian 


In table: 11, the allele frequencies of all the loci in Kakakhel Mian (KM) population are 
given. The highest allele frequency was found in this population was 0.7467 for DYS533 
while the lowest allele frequency recorded in this population was 0.0133 for DYS576 (14, 
15), DYS38911 (20, 24, 26), DYS19 (17), DYS391 (9, 12), DYS481 (22), DYS533 (9), DYS438 
(12), DYS635 (18, 20, 25, 26), DYS385a (10, 15), DYS385b (13), DYS456 (17) and GATAH4 
(10). DYS635 and DYS 385b were considered to be the most polymorphic loci among all 


23 loci typed in this population with 8 alleles. 
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Table: 
11. Allele Frequencies of 23 Y STRs in Kakakhel Mian (KM) population of Charsada district. 
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4.8 
.5 MohmandCharsada 


Allele frequencies for all alleles are given in table: 12.The highest allele frequency was 
found to be 0.8243 for DYS393 in this population while the lowest allele frequency 
recorded was 0.0135 for DYS576(21), DYS448(18), DYS19(17), DYS549(14), DYS355(9), 
DYS481(18), DYS438(8), DYS570(14), DYS635(26), DYS392(10), DYS385a(16), 
DYS385b(20), DYS456(17) and GATA H4(14). The most polymorphic among all the loci in 
this population was DYS570 with 10 alleles ranging from allele number 13 to allele 


21. 
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Table: 3 
12. Allele Frequencies of 23 Y STRs in Mohmand (MM) population of Charsada district. 
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Table: 
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4.9 Principle Coordinate Analysis 

A Principle Coordinate (PCoA) plots was constructed using data from the five 
populations from the study area and other neighboring populations available on 
YHRDwith accession number (Afghanistan (Pathan): YC000226; Xinjiang (Uighur): 
YA004122;Tibet China(Tibetan): YA004005; Xingjiang(Kazakh): YA003848; 
Greece(Greeks): YC000125; India (Tamil): YC000123; Israel: YC000064;Iran: YA003782; 
Iraq (Iraqis): YC000007; Russia: YC000079; Turkey: YC000296, YC000173, YA003467; 
Pakistan (Yousafzai): YC000226; Pakistani (Pathan): YC000147).The clustering pattern 
revealed that all the Pathan populations (including previously studied Pathan 
populations) from Pakistan and Afghanistan are clustered together which shows great 


homogeneity with each other and little genetic sharing with Russian populations as well 


(Fig.23). 
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Fig.23. Y-Fsr based PCoA plot of five populations of Charsada and Mardan and 
comparative global populations (The abbreviations MZ, KM, MM, YS, MD denote the tribes 
Muhammadzai, Kakakhel Mian, Mohmand from Charsada, Yousafzai and Mohmand from 
Mardan, respectively). 

4.10 mtDNA HVI analysis 

A total of 93 haplotypes were identified in 165 samples (table: 15). 38.7% of the haplotypes 


were found to be shared among all the populations and 57 haplotypes (61.3%) were 


singletons. 
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4.10.1 Genetic Diversity 
Genetic diversity in all the samples amplified for HVI control region was 0.993 and 


Match probability was 0.987 (table. 13). 


Table 13. Comparison of mtDNA Genetic diversity among various ethnic groups of 


Pakistan. 


Parameters 


No. of Sample 


No. of 
haplotypes 
No of unique 
haplotypes 
Genetic 
Diversity 


Path 
(this 
study) 


165 


93 


37 


0.993 


Mkr 
(Siddiqi 
et al. 
2015) 


100 


70 


54 


0.968 


Pth 
(Rakha et 
al.2011) 


230 


157 


128 


0.993 


Bal 
(Murci et 
al. 2004) 


39 


26 


18 


0.974 


Brh 
(Murci et 
al. 2004) 


38 


22 


15 


0.952 


Haz 


(Murci et 
al. 2004) 


23 


21 


19 


0.992 


Bsh 
(Murci et 
al. 2004) 


44 


32 


29 


0.980 


Ksh 
(Murci et 
al. 2004) 


44 


12 


5 


0.851 


Par 
(Murci et 
al. 2004) 


44 


22 


12 


0.950 


Snd 
(Murci et 
al. 2004) 


23 


21 


19 


0.992 


Note: The abbreviations Path, Mkr, Pth, Bal, Brh, Haz, Bsh, Ksh, Par, Snd and Sar stands for Pathan, 
Makrani, Pathan, Baluch, Brahui, Hazara, Burusho, Kalash, Parsi, Sindhi and Saraiki respectively 


4.10.2 MtDNA statistics 


Types and frequency of variations observed in the mitochondrial HVI sequences of all 


the five populations is given in the table: 14. 


Table: 14. Statistic vor of five =D of Charsada and Mardan. 


No. of 


Transitions 


No. of 


Transversion 


No. of 


Substitutions 


Sar 


(Hayat et 
al. 2015 


85 


63 


58 


0.957 


a di |2 118 FEO 


No. of Indels 


Note: The la MZ, KM, M e MD = S.d stands for Muhammadzai, Kakakhel Mian, 
Mohmand from Charsada, Yousafzai, Mohmand from Mardan and Standard deviation respectively. 
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Table: 15.List of SNPs defining mtDNA HVI Haplogroups in five Pathan populations of Charsada and Mardan district. 


Macro No.of 
Haplogroup variants 
Samples Haplogroup Frequency Origin SNPs 
KMO1 R U2e 4 S.Asian 7 A16051G, G16129C, A16182C, A16183C, T16189C, A16194C, C16197G 


132 


KM02 
KM04 


KM06 
G16474T 


KM08 
KM10 
KM11 


R 
R 
R 


H2(T152C T16311C) 
M3 
Jlblal 


U5a2 
U2c 
n 


A16293C, T16311C, G16391A, T16519C 


KM12 
KM13 
KM14 
KM15 
KM16 


KM17 
T16519C 


KM18 
KM20 
KM25 
KM27 


KM28 
T16519C 


KM30 
KM31 
KM32 
KM34 
KM35 
KM38 
MD02 
MD03 
MD04 
MD05 


MD07 
MD08 


R 


ArmA Zea 


< EZ Y 


A A ZA ADEA A A 


R6(G16129A) 
Jlc 

U6c 

M5al 

H1 

JIb5b 


U2elf 
M33 
M6alb 
R8ala3 
M4 


H6 

N7 

U2elh 

C4a 

R2 
H10((116093C)) 
H2a2a1d 
Nlalala 

Ula 

H2a3 


W4 





C.Asian 





C.Asian 





23 
17 
13 
15 





IndoEuropian 
IndoEuropian 
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5 
5 


51 T16311C 
74 T16126C, C16223T, T16311C, T16519C 
57 C16069T, T16126C, G16145A, T16172C, C16222T, C16261T, 


15 C16192T, C16256T, C16270T, C16278T, G16526A 
22 A16051G, C16179T 
18 G16129A, A16180G, C16223T, A16254G, 


34 G16129A, G16213A, T16362C, T16519C 


16 C16069T, T16126C, C16168T, C16266T, C16278T, T16311C 
35 G16129A, C16169T, A16183C, T16189C, A16194C 
15 G16129A, C16223T, C16234T, C16291T, T16519C 


12 A16051G, C16239T 
27 C16069T, T16126C, G16145A, C16261T, T16263C, C16290T, 


A16051G, G16129C, A16182C, A16183C, T16189C, A16194T, C16221T, T16311C, T16325C, T16362C, 
111 C16447T 


14 C16223T, T16324C, T16357C, C16527T 
25 C16188T, C16223T, T16231C, T16362C, T16519C 
24 C16259T, C16292T, A16497G, T16519C 
37 T16086C, C16111T, G16145A, C16223T, C16261T, T16311C, 


12 T16362C, A16482G 
23 G16129A, C16223T, T16519C 
26 A16051G, G16129C, A16183C, C16193CC, T16362C, T16519C 
15 T16093C, G16129A, T16298C, C16327T, T16519C 
52 C16071T, T16519C 
34 T16093C, C16278T, G16319A, T16519C 

T16172C, A16182AC, A16183C 
C16147A, T16172C, C16223T, C16248T, C16320T, C16355T, T16519C 
A16051G, A16206C, T16311C 
T16093C, C16223T, C16234T, G16274A, T16519C 


T16086C, C16223T, C16286T, C16292T, T16519C 
C16192T, C16223T, A16284G, C16292T, T16519C 


MDO09 R H2a2a1g 6 4 T16172C, A16182AC, A16183C, T16189C MD10 M G2a1d2 1 4 C16223T, A16227G, C16278T, T16362C MD12 R U4a3 1 3 


T16356C, T16362C, T16519C 


MD13 
MD18 
MD20 
MD23 
MD25 
MD26 
MD29 
MD31 
MD34 
MD36 
MD38 
MD39 
MMO01 





MMO02 
T16519C 


See See eee eee ee 
See ee oe ee ee Se 
Pe SRNR ER SoS Eas 





o eo 
a. > 3 


< 





MZ06 


Z 94 Z ZE Z Xx ZA ZZZ 4 X 


Z K RR Z £ HR A SH IRA IH ZA © CA 


U7 
U5al1b(T16362C) 
M5 
M65a(C16311T) 
M5a2ala 

H94 

N 

H5 

D4q 

M52a 

M71 

Tla 

P6 

M2a1 


H15a 
Tla1'3 
M49 
U2e3 
H13ald 
Y2 
H2a2b 
Hle 
R0a 
H4a 
M18b 
P4a 
H14b1 
C4a1(G16129A) 
M2b 
H1bt 
Flcla 
W6 








C.Asian 


C.Asian 





C.Asian 





13 


IndoEuropian 


> 


34 


6 


45 A16309G, A16318T, A16343G, T16362C, T16519C 
26 C16192T, C16256T, C16270T, T16362C, A16399G, T16519C 
23 G16129A, C16223T, T16519C 
25 T16126C, C16223T, A16289G, A16399G, T16519C 


14 G16129A, C16223T, A16265C, T16519C 
12 C16339T, C16355T 


63 C16223T, C16292T, T16519C 
43 C16266T, T16304C, T16519C 
16 C16223T, C16256T, T16311C, G16319A, T16362C, T16519C 
15 C16223T, A16275G, G16303A, G16390A, T16519C 


12  C16223T, T16271C 
16 T16126C, A16163G, C16186T, T16189C, C16294T, T16519C 
16 C16221T, T16311C, T16325C, T16362C, C16447T, T16519C 
17 T16075C, C16223T, C16270T, G16274A, G16319A, T16352C, 


21 C16184T 
26 T16126C, A16163G, C16186T, T16189d, C16294T, T16519C 
23 C16223T, C16234T, T16519C 
14 C16168T, C16234T, C16260T, T16362C 
11 C16234T 
15 T16126C, C16223T, C16266T, T16311C, T16519C 
12 (C16291T, T16311C 


23 G16129A, A16182AC, A16183C 
23 T16126C, T16362C, T16519C 
11 C16287T 
15 A16160G, C16223T, A16318T, T16325C, T16519C 
13 A16037G, C16111T, G16319A 


12 T16126C, T16519C 
45 G16129A, C16223T, T16298C, C16327T, T16519C 
11 C16169CC 
16 C16292T, C16355T, T16406d, A16497G, T16519C, C16527T 
C16111T, G16129A, T16304C 
C16192T, C16223T, C16292T, T16325C, C16465T, T16519C 


MZ11 R 
MZ16 R 
G16391A, C16447T, T16519C 
MZ22 R 
MZ24 R 
MZ27 M 
MZ28 R 
T16519C 

MZ33 R 
MZ34 R 
MZ36 R 
YS01 M 
YS03 R 
A16300G, A16326G, C16353T 
YS04 R 
YS06 R 
YS07 R 
YS09 R 
YS11 R 
YS12 M 
YS13 R 
YS19 R 
YS20 M 
YS21 M 
C16327T, T16357C, T16519C 
YS23 R 
YS24 R 
YS30 R 
YS31 N 
YS36 L3 
YS37 

YS38 R 
T16362C 


H13alald 


I6a 


H3p 
J1b3 
M39a 
T1 


H14a 
T2e 
U2b2 
M2ala 
H2a2alc 


JT 

HV2 
Ula2 
Jlb4a 
H17al 
C4a2'3'4 
Ulala 
T2b2b 
M3c2 
C4a4b 


H1k 
J1b 

R 
N1b1 
L3 
M38d 
R6b 





C.Asian 





C.Asian 





C.Asian 


E.African 


= e N 
JI dW OF 
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16 T16086C, G16145A, C16173T, C16261T, T16311C, T16519C 
18 G16129A, C16223T, A16293C, T16311C, T16362C, 
13 C16069G, C16222T, A16269G 
16 C16069T, T16126C, G16145A, C16222T, A16235G, C16261T 
11 C16353T 
17 T16093C, T16126C, A16163G, C16186T, T16189C, C16294T, 
13 C16256T, C16270T, T16352C 
15 T16126C, G16153A, C16294T, C16296T, T16519C 


12 A16051G, C16239T 


26 C16223T, C16270T, G16319A, T16352C, C16449T, C16451T 
48 A16051G, T16086C, C16259A, C16267T, C16291T, 
13 T16086C, T16126C, T16519C 


12 T16217C, C16446T 
14 G16129A, T16189C, C16192T, A16202C 


16 C16069T, T16126C, G16145A, C16218T, C16261T, C16287T 
14 G16129A, C16223T, C16291T, T16519C 
16 C16223T, T16297C, T16298C, C16327T, T16357C, T16519C 
16 A16051G, T16154C, A16206C, A16230G, T16311C, T16519C 
15 C16111A, T16126C, C16294T, C16296T, T16519C 
14 T16126C, T16154C, C16223T, T16519C 
18 T16086C, G16129A, C16150T, C16223T, T16298C, 


14 A16051G, T16189C, C16290T, C16292T 
C16069T, T16126C, G16145A, C16261T, T16519C 
C16292T, A16497G, T16519C 
G16145A, C16176G, C16223T, C16256T, A16309G, G16390A, T16519C 
5 T16093C, G16129A, C16223T, A16305T, T16519C 
15 G16129A, C16223T, C16266T, T16311C, T16519C 
27 C16179T, A16227G, C16245T, C16266T, G16274A, C16278T, 


4.10.3 Mitochondrial DNA HVI Haplogroups found 

To characterize the maternal genetic variation among the populations, haplogroup 
frequencies were calculated. Among the 93 haplotypes, 63.4% of the samples belong to the 
haplogroup R while haplogroup M, N and L were found with the frequency of 26.8%, 


8.6% and 1.1% respectively (Fig.24). 
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Fig.24. mtDNA HVI Haplogroup frequencies among five Pathan populations of Charsada 
and Mardan. 


4.10.4 Network Analysis 


To obtain insight into the genetic structure of the studied populations Median joining 
Networks were constructed for all the major haplogroups based on the frequency of 


haplotypes in these five populations using program NETWORK 4.6.1.3. 
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3.10.5 Macro Haplogroup R 
Little sharing of haplotypes was observed among populations in haplogroup R and a 
“star haplotype” with total frequency of 47.8% in MM, 17.4% in MD, 13.04% in MZ and KM 


and 8.7% in YS was found in this haplogroup sequences (Fig.25a). 


a Muhammad Zai (MZ) 
| Mohmand (MM) 
m Mohmand (MD) 





MI Kakakhet Mian(KM) 
C Yousafzai (YS) 
Fig.25.a. Median-Joining network of mtDNA haplogroup R in five populations of Charsada 
and Mardan. Areas of circles are proportional to the haplotype frequencies. 


4.10.6 Macro Haplogroup M 


Very little sharing of haplotypes was observed in Hg M. and (Fig.25 b). 
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There was no haplotype found shared among all the five populations. Two haplotypes were 
shared between Mohmand from Charsada (MM) and Yousafzai (Ys) from 


Mardan.Similarly single haplotype was shared between Kakakhel Mian (KM) and 


Muhammadzai (MZ). 
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Fig.25.b. Median-Joining network of mtDNA haplogroup M in five populations of 
Charsada and Mardan. Areas of circles are proportional to the haplotype frequencies. 
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4.10.7 Macro Haplogroup N 
This haplogroup is present in 8.6% of the samples and no haplotype sharing was recorded 
in Hg N (Fig. 25c). 


O E Muhammad Zai (MZ) 
L Mohmand (MM) 
MI Mohmana (MD) 
MI Kakakhel Mian(KM) 


Yousafzai (Y5) 


Fig.25.c. Median-Joining network of mtDNA haplogroup N in five populations of 
Charsada and Mardan. Areas of circles are proportional to the haplotype frequencies. 
mtDNA HVI Fsr among all the five populations is represented in the fig.26. 
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Fig.26. mtDNA HVI based Pairwise genetic distance (Fst) among five population pairs. 
(The abbreviations MZ, KM, MM, YS, MD denote the tribes Muhammadzai, Kakakhel Mian, 
Mohmand from Charsada, Yousafzai and Mohmand from Mardan, respectively). 


4.10.8 Analysis of Molecular variance (AMOVA) 


The proportion of genetic variation distributed within and between the populations of two 
districts was assessed by analysis of molecular variance (AMOVA). Percentage of variation 
among groups was -1.57% and variation among populations within the group was 3.63% 
while percentage of variation within populations was found to be 97.94% 


(table: 16). 
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Table: 16. AMOVA Analysis (mtDNA HVI Haplotypes) of five populations from two 
districts (groups). 


Source of Degree of Sum of Squares Variance Percentage of 
Variation freedom re EP 
Among Groups a 1.270 pg 02156 Va -1 E° 
Among 

population 9.600 0.04977 Vb 

within Groups 


Within 182 244.445 1.34311 Ve 97.94 
populations 





4.10.9 Principle Coordinate Analysis 


Fsr values were calculated for all pair of populations in Arlequin V.3.5.2.2 (Excoffier et al., 
2010). No significant genetic distance (Fsr) was observed between population pairs except 
for the Kakakhel Mian and Mohmand. The PCoA plot provided information regarding the 
degree of association between different indigenous and global populations using 
comparative sequence data from Greece, Russia, Iraq, Iran, Israel, Afghanistan, Xinjiang 
china, Turkey, South India, North Indian Afridi, Balouchi, Brahui, Brusho, Kalash and 
Pakistani Pathan population. Clustering pattern revealed that Mohmand (MD) and 
Yousafzai from Mardan are clustered together with Balouchi and Pathans from Pakistan 
and Afridi pathans from India while Muhammadzai and Kakakhel Mian from Charsada 
are clustered together with Brahui and Brusho from Pakistan. On the other hand 


Mohmand (MM) from Charsada clusters with Turkish population,whereas Tamil from 
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South India and Kalash from Pakistan clustered separately (Fig.27). 
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Fig.27. mtDNA-Fsr based PCoA plot of five populations of Charsada and Mardan and 
comparative indigenous Pakistani and global populations. (The abbreviations MZ, KM, MM, 
YS, MD, denote the tribes Muhammadzai, Kakakhel Mian, Mohmand from Charsada, Yousafzai 
and Mohmand from Mardan, respectively). 


4.11 Spatial correlation of data 


Result of the Mantel Test revealed that no significant correlation existed between the 
genetic distance obtained from Y-Fst or mtDNA-Fst and geographic distance. A very little 
positive correlation (Table: 17) was recorded between Y-Fst and mtDNA-Fst values. 


Table17. Correlation between genetic distance based on Y-Fst or mtDNA-Fst and 
geographic distance 





Correlation Y-chromosome / 0.21 0.13 
Coefficient mtDNA 
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Y-chromosome / -0,45 
Saver neem 





Chapter-5 


DISCUSSION 


Several Pathan tribes are residing in Pakistan, whose origin and affinities are still question 
mark. A research endeavor was carried out to elaborate Pathan population of 

Mardan and Charsada using 17 STR loci with high Discrimination Capacity reported by 
Lee et al. (2014). It reported that the 17 loci in Y-filer had low haplotype diversity in Pathan 
populations of Pakistan, but diversity was considerably increased with additional useful 
STR loci. In the present study, increased haplotype diversity and considerably increased 
discrimination capacity was observed in different Pathan populations of Charsada and 
Mardan with the use of 23 Y-STR loci which signals that the additional six loci in Promega 
PowerPlex® Y23 System are very useful for genetic investigations of these populations. 
Pathans generally have been reported to have low haplotype diversity due to patrilocality 
(Vermeulen, M. et al., 2009; Mohyuddin, 2001; Ghosh, 2011) but here the increased 
haplotype diversity has been observed among these tribes using 23 Y-STR loci. The most 
polymorphic locus among them was DYS570 and being rapidly mutating (RM) locus such 
markers are good candidates for population differentiation (Ballantyne 


et al., 2010; Ballantyne et al., 2012). 
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Genetic finding of Haber et al., (2012) shows that the Pathans of Afghanistan exhibit genetic 
affinity with the North and west India and they split from rest of the afghans 

4.7kya during the rise of area’ s first civilizations at Indus valley and Bactria-Margiana. 
The main lineage in Pathans of Afghanistan has the oldest coalescent time of 14kya in Indus 


valley. 


The Y-haplogroup frequencies among the Pathan populations of Mardan and Charsada 
Districts show that they exhibit a blend of haplogroups of different origins, with 
contribution of 71.3% of South Asian, 15.73% from Middle Eastern, Caucasian or 
GrecoAnatolian and 10.4% of Central Asian origin besides little or negligible percentage 
of haplogroups of African origin, e.g. haplogroup Elbla and E1b1b which are sub Saharan 
African and Arabian specific haplogroups. The Elb1b1-M35 lineages in some Pakistani 
Pathan were previously traced to a Greek origin brought by Alexander” s invasions 
(Firasat et al., 2007). However, RM network analysis of Elb1b1-M35 revealed that 
Afghanistan’ s lineages were correlated with Middle Easterners and Iranians but not with 
populations from the Balkans. The Islamic invasion in the 7th century CE left an immense 
cultural impact on the region, with reports of Arabs settling in Afghanistan and mixing 
with the local population (Emadi, 2005). However the genetic signal of this expansion is 
not clearly evident as some Middle Eastern lineages such as E1b1b1-M35 were also found 


in Afghanistan. 


The high frequency of Y haplogroup R1a in all of the five populations has been noticed. 


Almost equal frequency of this haplogroup in Pathans from Afghanistan has been 
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observed that indicates the genetic similarity between the two (Lacau et al., 2012) and 
similarly equal frequency of this haplogroup has been recorded in Pakistani Pathans also 
(Lee et al., 2014). South Central Asia i.e. Pakistan and India (Underhill et al., 2010; Mirbal et 
al., 2009) are proposed as the most likely places of origin because in both regions Rlala- 
M198 has been observed at frequencies 50% or above. According to Underhill et al, 

2010, the highest Y-STR haplotype diversity for the Rlala lineage is observed in South 
Central Asia with a coalescent time of 14kya, suggesting that this region is the likely source 
of the dispersal for the M198 mutation. Alternatively, Klyosov claims that haplogroup 
Rlala-M198 originated in South Siberia about 20kya. But later the South Central Asian 
origin of this haplogroup was well supported.Our results revealed the frequency of this 
haplogroup ranging from 41.3% to 54.6% among all the five populations of this region 


providing a clear evidence for the local origin of this haplogroup. 


Similarly, Y haplogroup L being second most prevalent haplogroup comprising 17.9% of 
the total haplotypes, has already been reported in Pakistani Pathans with frequency of 
approximately 7% and with moderate frequency of approximately 2% among other 

Pakistani populations, whereas its high frequency approximately 23% was reported in 
Pathans of Afghanistan (Firasat et al., 2007). The Y chromosomal DNA variation in Pakistan 
by Qamar et al., (2002) reported 14% frequency of Haplogroup L where it was called 
haplogroup 28, in overall Pakistani population (Qamar et al., 2002), while 5.9% of this 
haplogroup had been reported in Pakistani Pathans in 2014 (Lee et al., 2014), which is said 


to be related with the arrival of agriculture in this area of Indian-subcontinent (Qamar et 
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al., 2002). Haplogroup L-M20, was hypothesized to have originated in India or the Middle 
East (Wells, 2007) approximately 30kya. This marker, which is found at 25% in north 
Afghanistan and 4.8% in the south, has also been previously reported at high frequencies 
(48%) in the Kallar community of South India (Wells et al., 2001) as well as in the Druze 
(35%) population from Israel.( Shen et al., 2004) Time estimates generated based on seven 
Y-STR loci within L-M20 lineages for north (14.6+7.3kya) and south (17.8+8.4kya) 
Afghanistan populations are intermediate to those of Pakistan (26.345.3kya) and India 
(7.5t1.7kya) (Lacau et al., 2012). Furthermore, Pakistan displayed higher haplotype 
variance (0.548) than India (0.118), suggesting that L-M20 most likely originated in what is 
today Pakistan rather than in India (Lacau et al., 2012) and findings of present investigation 
confirms this argument and also support the dispersal of this haplogroup L from its place 
of origin to Southern Afghanistan. 

Likewise, the highest frequencies of Y-chromosome Haplogroup H-M69 are in India and 
Pakistan especially in 4.1% Burusho, 20.5% Kalash, 4.2% Pathan and 6.3% in 

other Pakistani populations (Thanseem et al., 2006; Qamar et al., 2002). It is also present in 
high frequency of 25 - 35% in tribal population and in low frequency of 10% in Western 
India and Pakistan (Sahoo et al., 2006; Cordaux et al., 2004).The low frequency of 
Ychromosome Hg R1b, E1b1b and J2 of predominantly Greek lineage in these populations 
and in the Pathan population of Afghanistan was observed, which strengthen the idea of 
Firasat et al., (2007) about the limited Greek contribution to the Y chromosome structure of 
Pathans. The E1b1b1-M35 lineages in some Pakistani Pathan were previously traced to a 


Greek origin brought by Alexander” s invasions (Firasat et al., 2007). 
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For mtDNA analysis, 1t has been noticed that profound genetic variations were accounted 
within populations (97.97%) rather than between the populations (3.63%). while variance 
distribution was greater within populations (1.3%) than among the populations (0.04%). 
This relatively high heterogeneity among these Pathan populations further modeling the 
mtDNA landscape that might be the result of unequal genetic influence obtained during 
different settlement events from geographically adjacent neighbors. On account of that, 
some haplotypes are shared across the tribal boundaries which may be due to maledriven 
intermarriages. 

The mtDNA Macro haplogroups recovered from Pathans of Mardan and Charsada are R, 
M, N and L. The first three haplogroups are thought to be originated around 60000-75000 
years ago in South Asia (Kivisild, 2003). Therefore, the presence of these haplogroups 
suggests a South Asian maternal origin of these populations. The macro-haplogroup L 
(Sub-Saharan Africa) was detected in just 0.5% 1.e. only on one subject, while East African 
haplogroups (L4, L5, L6 and L7) are also not detected in Pathans as yet; while all the 
Arabian Peninsula countries are more likely to have a comparatively high frequency, such 
as Yemen (38%), Oman (16%) and Saudi Arabia (10%) (Abu-Amero et al., 2007). The most 
common South Asian haplogroups in Pathans comprises of R (67.9%) and M (22.3%). In 
the present study, the mitochondrial haplogroups of South Asian origin have highest 
proportion of 37.6%, comprising M3 (7.5%), R2(5.3%), U7, U2e (4.3%), M4, U6c 

(3.2%), U2c, M65a, M6a1b, U5alb, M49, M5, M2ala, M6a1b, R0a, U2e1h (2.2%) and U5a2, 


U2e3, U4a3, U2e1f, U2b2, U2ala, U2a, M52a, M5a1, M33, M38d, M39a, M2b, M18b, 
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M71(1.1%). The second major haplogroups identified were of Southwest Asian or Middle 
Eastern origin (36.5%) including H2a2alg (6.4%), Jlblal, H2 (5.2%), H2a2alc, H5 (4.3), 
H10, R6 (3.2), H15a, Hle, H2a2ald, R8ala3, J1b, J1b5b (2.2%). The third most prevalent 
mitochondrial haplogroups are of Central Asian origin (8.6%) followed by West Eurasian 
(6.5%), east Asian and Indo-Europeans (3.2%) and the least frequent haplogroup was 
found L (1.1%) of East African origin. The high frequency of macro haplogroup M had 
been recorded in Asia, particularly in India, Bangladesh, Nepal and Tibet reaching up to 
60 to 80% (Rajkumar et al., 2005). Mitochondrial haplogroup M3 being the sub-clad of 
HgM, has been found in South Asia, with highest frequency in West India and Pakistan 
(Metspalu et al., 2004). It has been reported in Pakistani Pathan population with frequency 
of 7.8% (Rakha et al., 2011) and this study reported the 7.5% occurrence of this 
haplogroup.This concurrence of mtDNA markers could evident significant female 
enigmatic contribution towards the genetic structuring and socio-cultural expansion of 
multiethnic Pathan population (Rakha et al., 2011). Moreover this uniparental genetic 
landscape of mtDNA HVI data proposed that the Pathans revealed a profound blend of 
genetic components from South Asia, the Caucasus, Europe, Arabian Peninsula and 
Central Asia (Quintana-Murci et al., 2004). It is obvious that the majority of Pathan 
autochthonous M lineage huddled into M5 and M3 sub-haplogroups which have 
characteristic South Asian origin. 

Phylogeographic analysis has revealed that M5al haplogroup is most abundant in all Roma 
Gypsies population, including Hungarian, Bulgarian, Iberian and Balkan Roma 


(ranging from 6 to 29%), constituted a founder population that distributed across West 
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Asia and whose origin might be traced back to the Indian Subcontinent, particularly 
Pakistan (Mendizabal et al., 2011). In these five populations it was found in just 1.1%. 
Another mitochondrial DNA lineage, the West Eurasian macrohaplogroup N has been 
proved to be the ancestor of numerous haplogroups prevailing in Europe, Middle East, 
Asia and the Americas (Kivisild et al., 1999; Nasidze et al., 2008). It is present though less 
common in Pathans, with the overall frequency of 8.6% in their genetic pool, which is in 
agreement with previous studies (Quintana-Murci et al., 2004; Rakha et al., 2011). The 
haplogroup R (sub-clad of N) cuddled the majority of the West Asian and European 
haplogroups H, HV, J, T and U, which have varying frequencies in fivePathan populations 
(Malmstrom et al., 2015). The incidence of overwhelming immensity of R lineage (63.4%) 
in Pathan has a clear Europe and West Asia attribution due to Paleolithic and Neolithic 
expansions of Caucasian that reached in South Asia via Iranian plateau and might be 
Arabian Sea nautical routes, could be the possible argument for this genetic influx 
(Stoljarova et al., 2016; Haber et al., 2016). Furthermore, haplogroup RO comprising HVO, 
HV1 and HV2 was scarcely found in these fivePathanpopulations (1.15%) as compared to 
other Pakistani population, having highest frequency in Balochi (10.3%), Pathans (9.1%) 
and absent in Burusho and Hazara (0%) (Kivisild et al., 2002; Alvarezlglesias et al., 2009). 
However, neighboring countries exhibited low frequency such as Uzbek (2.4%), Turkmen 
(2.4%) and in Turkish its frequency dropped to 0% (Rakha et al., 2011; Siddiqi et al., 2015). 
The haplogroup H was 9.8% prevalent in overall five Pathan populations but different 
from West Asia (45%) and Near East (25%) (Palanichamy et al., 2015). Whereas neighboring 


population exhibit relatively high frequency of this haplogroup, such as Iranian (14.3%), 
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Uzbek (21.4%), Turkmen (22%) and Tajik (29.5%) (Whale,2012; Derenko et al., 2013). 
Similarly other Pakistani populations exhibited very high frequency of haplogroup H i.e. 
Sindhi (28%), Brahui (26.3%), Balochi (20.5%) but moderate in Burusho (12.3%) and Hazara 
(13%) population (Quintana-Murci et al., 2004; Bhatti et al., 2016). Notably, under these 
assumptions it is hard to discern sequential gene flow or expansions at the population level 
because the most recent migration i.e. British colonization in the 18th century could hold 
both early and derivative lineages, which contributed significantly in the genetic pool of 
the Pathan (Jons, 2016). Besides aforementioned haplogroups, analysis of specific 
haplogroups J revealed some interesting geological and chronological distribution in 
Pathan. It was found with the frequency of 9.6% reflected a deep Jewish and European 
ancestry conglomeration. It is noteworthy that expansion of the Jewish community 
particularly Ashkenazi, across the Roman Empire and Iranian Plateau, was going to be 
around 1000 BC (Mosk, 2013). However, on account of expulsion in the Western Europe 
during the 15th century, they subsequently dispersed into Eastern Europe and moved up 
to Afghanistan, North West Pakistan and India (Behar et al., 2004; Shlush et al., 2008; Costa 
et al., 2013). Yet, Pakistani and Indian Jewish communities are an early branch of the Jewish 
Diaspora with their several exclusive socio-cultural features, intricate history and minor 
Middle East specific ancestry components (Oddie, 1991; Ostrer, 2001; Chaubey et al., 2016). 
Another distinguished mitochondrial super-haplogroup U that was most abundant in 
Pathan (21.8%), which includes haplogroups U2, U5 and U7. 

The haplogroup U was widely distributed in the Europe and Near East with the 


coalescence age of about 51,000-67,000 years before present (Roostalu et al., 2007). 
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However, this widespread distribution, when incorporated with the presence of a 
profound associated haplogroup U2 found in native South Asians, suggests that 
haplogroup U2 is very much entrenched, divergent and was presumably a former lineage 
of the super-haplogroup U in South Asia (Barcaccia et al., 2015). This statement was further 
strengthened by the incidence of U2 haplogroup in excavated human remains in the 
Southern Russia at around 30 000 years before present 1.e. during Pleistocene expansion. 
Due to frequent distribution of U2 in South Asian and European huntergatherer lineages, 


to t 


it has practically predicted some association among ,,,,Paleolithic Hunters and present 


day native inhabitants (Krause et al., 2010). Another aforementioned mitochondrial 


ee 


lineage, U7 has originated in the ,,, Black Sea” “ zone between Southeastern Europe and 
Western Asia dated back to 51 000-67 000 years before present. It is markedly found among 
India (20%) and Near East (10%) (Roostalu et al., 2007) and found 4.3% in these Pathan, 
23% in North Indians, 10% in South Indians, 7.2% in Iran while absent in the Western 
Asian or Anatolian and Eastern European or Balkans population (Palanichamy et al., 2015; 
Bahmanimehr et al., 2015; Chaitanya et al., 2015). This implied some degree of complex 
overlapping of haplogroups, across the entire Persia, Pakistan (KPK) and India during 
various migratory proceedings that was an outcome of succeeding historical and 
geographic gene flow (Bahmanimehr et al., 2015). Despite primary sub-haplogroups of U, 
one European-specific haplogroup U5 exhibited relative frequency of 3.3% in Pathan. 


Earlier investigations of ancient mtDNA proposed that U5 haplogroup was most frequent 


in Mesolithic and Neolithic Europeans. For example, a high frequency (65%) of U5 
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haplogroups has been assessed in the European hunter-gatherers individuals (Malmstrom 
et al., 2009; Sanchez-Quinto et al., 2012; Cassidy 


et al., 2016). 
The frequency of Y-haplogroups and mtDNA haplogroups on the basis of their origin in 


these Pathans is almost similar with a little difference. The plausible explanation for this 
difference may be the limited sample size typed for mtDNA analysis and a different 
picture could be expected by increasing the sample size for populations. 

The frequency of these mtDNA macro haplogroups found in the populations we 
genotyped in present study is in close agreement with the one already reported in 
Pakistani Pathans with a little variation in frequency of sub-clades. Mitochondrial 
haplogroups HV9, M65, H1k, U2a1, M71, M52 and U5alb were confined to the populations 
of Mardan (Mohmand: MM, Muhammadzai and Kakakhel Mian) only while haplogroups 
U7, M18 and P6 were restricted to Charsada populations (Yousafzai and Mohmand: MD) 
only and absent in Mardan. The high frequencies of local lineages in these populations 
confirm their descent as a consequence of Paleolithic population expansion (McElreavey 
et al., 2005). 

It is indicative that the pattern of Y-haplogroup distribution in Afghanistan and already 
studied general Pathan population from Pakistan investigated by Sangupta et al., 2006, 
show great similarity. In the light of our results and their homology with the previously 
documented genetic information, it can be inferred that these populations have been 
descended from Durrani Pathans in South and Southwest Afghanistan which is in 


agreement with the oral historical tradition that the potential paternal ancestors of these 
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Pathan populations were Durrani which were descendants of haphthalites rooted from 
Qais (Allah et al., 2013). On mtDNA-Fsr based PCoA scatter plot, Mohmand from 

Charsada (MM) showed great genetic affinity with Turkey, Israel and Iran positioned 
somewhere at the middle of these three, depicting their admixed maternal lineage from 
Turkey, Israel and Iran rather than Afghanistan. Similarly Yousafzai and Mohmand (MD) 
from Mardan show relatedness with Afridi Pathans from West India. The logical 
explanation for different clustering on mtDNA variation may be the different origin of 
matrilines and patrilines of these populations. But the difference between paternal and 
maternal origin is not confirmed by although very little positive correlation between 
distance matrix of mtDNA and Y-STR. In the light of great genetic similarity revealed 
among the populations studied, results validate the origin of the three of the four Pathan 
tribes from the single patriline as documented in the legend, i.e. from “Sarban confederacy” 
While lack of Arabian Y clusters and mtDNA haplogroups in Kakakhel Mian population 
refute the Arabian descent of Kakakhel Mian. Another logical explanation for different 
clustering pattern may be the more frequent male movement among the two areas as 
compared to the females which contribute to the more gene flow in paternal gene pool 
reducing the differentiation among the males of these populations while strict practice of 
endogamy within the populations contribute to retain their genetic differentiation on 


matrilines. 


CONCLUSION 


On the basis of our findings compared with already available scientific information on 
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Pathans we conclude that vast majority of paternal gene pool of the Pathans from Mardan 
and Charsada populations exhibit genetic uniformity with other Pathans from 

Pakistan and Afghanistan, displaying a mosaic gene pool with haplogroup sharing of the 
Middle Eastern, Caucasian, South Asian and Central Asian gene pools. It can be inferred 
from our results that these populations are donors of the Pathan population of areas of 
Afghanistan from where these populations migrated to their present settlements. 
Furthermore the Pathans have maintained the Genetic diversity irrespective of the 
geographical location and sub-population or caste. The possible reason for this pattern is 
the common practice of endogamy restricting their intermixing with other ethnic groups. 
Moreover our results are congruent with the above stated hypothesis about the legendry 
Afghan ancestor, where these populations exhibit considerable homogeneity contributed 
by being a descendant of a single Y chromosome i.e. Sarban. The study presented in this 
dissertation reports the first hand information with respect to Pathans of KP province 
which can be extended to other areas of the region with maximum analytical material and 
tools for establishing intra and inter population affinities of the tribes and clans. This 
information can be used as a useful database for establishing a forensic baseline for the 
Pathan population of the region. For complete elaboration of the phylogenetic position of 
Pathan lineage, complete Mitogenome sequence analysis and Y chromosomal SNP 


analysis is required. 
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te ETHNOGENETIC PROJECT DEPARTMENT OF GENEITCS HAZARA 
> UNIVERSITY, MANSEHRA 
Tel: 0997-414131 Fax: 0997-530046 
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NOTE: This Performa is the property of the ETHNO GENETIC PROJECT (HEC) 
Department of Genetics Hazara University Mansehra Pakistan. This information will be 
used for research purpose only and will be kept secret. 


Ref. #: 
Lab code: 
Date: 
DETAILS OF PARTICIPANT 
Name: Father's Name: 
Age: Sex: Native Language: 
Ethnic Group: Caste 


Collection point: 


Home Address: 


Biological Samples Collected 
W Dental impressions Misaliva 


Consent of Participant 
~ The researchers have informed me about the purpose of sample collection and their 
use in human genetics research. I have provided samples voluntarily to improve 


science in 


Pakistan. Signature of Participant: APPENDEX II 


STOCK REAGENTS 
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Phenol:Chloroform Mixture (1:1) 


For each sample 200uL of phenol and 200uL of chloroform were used. 


Lysis Buffer 


500mM Tris-base 


250mM EDTA 


5% SDS 


Proteinase K 75ug/mL of lysis solution 


B-mercaptoethanol (14.4M), 1uL/mL of lysis solution 


50X TAE buffer 

M Tris-HCl PH8 

0.5M EDTA 

Make up to 1 L with dH20 and autoclave 
Bromophenol blue dye 

0.1 g bromophenol blue 

Dissolve 


Adjust volume to 100 ml with dH20O, stir overnight 


PH to 8.0 


Filter through Whatmann filter paper 
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Store at room temperature 
10 mg/ml Ethidium bromide (EtBr) 
Add 1 g of Ethidium bromide to 


100 ml of ddH2O 


Stir for several hours until completely dissolved Store wrapped in aluminum foil at 


4YC 1M Tris 


HCl 


For 1L: 
121.1g Tris 
800ml m H2O 
~42ml HCL 


Autoclave for 20 minutes on liquid setting. 


SDS 20% 


Dissolve 20 g SDS (sodium dodecyl sulfate or sodium lauryl sulfate) in H20 to 100ml total 


with stirring (it may be necessary to heat the solution slightlyto fully dissolve the powder). 


Filters sterilize using a 0.45-Um filter. 


Keep 10ml of 20% solution in a Falcon tube with liquids. 


dNTP 


Master stock: 100mM. 
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Create a stock solution that is 2.5mM for each dNTP. 


Add 25ul of each dNTP to 900ul of H2O for a total stock solution size of 1ml. The 
final recommended final concentration for each dNTP in the PCR reaction is usually 
200uM. Therefore, for a 25ul PCR reaction, add 2ul of the above 2.5mM stock 


solution. 


For Electrophoresis 


6X Loading dye Solution (from Fermentas) 
10mM tris-HCI (pH7.6) 

10mM EDTA 

0.005% bromophenol blue 

0.005% xylene cyanol FF and 


10% glycerol 


TAE buffer, 50x 


242 g Tris base 

57.1 ml glacial acetic acid 

37.2 g Na2EDTA + 2H20 (2mM) 

H20 to 1 liter 

This solution does not normally needs to be sterilized. The Tris base and acetic acid 


correspond to 40 mM Trisacetate 
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TBE buffer, 10x 


108 g Tris base (890mM) 

55g boric acid (890mM) 

40 ml 0.5 M EDTA, pH 8.0 (20mM) 

H20 to 1 liter 

10x and 5x TBE tend to precipitate over time. If convenient, dilute to 2x or 1x immediately, 


or stir continuously. This solution does not normally need to be sterilized. 


SB buffer, 50X 


0.5M Sodium Hydroxide (NaOH), pH adjusted to 8.5 with boric acid (H3BOs). 


OR 


250mM Disodium borate decahydrate (Na2B407 : 10H20) 


DNA Ladder (Fermetas) 


For: 


GeneRuler DNA Ladder Mix 


GeneRuler 1kb DNA Ladder Plus 
Add 20 ul of DNA Ladder, 20 


ul of Loading Dye Solution 


80 ul of water. 
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Primer Dilutions for PCR and Sequencing 


Dilute primers to 5ul/ml for PCR and 10 ul/ml for Sequencing. 


Master stock, 100 uM 


100 uM = X nmoles lyophilized primer + (X x 10 ul molecular grade H20) 


To determine the amount of H20 to add to the lyophilized primer multiply the number of 
nmol of primer in the tube by 10 and that will be the amount of H20 to add to make a 100 


uM primer stock. 


The original primer tubes are often used for this 100 uM stock. 


Master stock primers newly suspended in H2O should be allowed to sit at room 
temperature for 10 minutes before they are used for working stock dilutions. Mix well 
before making working stock dilutions. 


Working stock, 10 uM, 5 uM 


Dilute the primer master stock in a sterile micro centrifuge tube 1:10 with molecular grade 


H20. 


Dilution Calculation Example 


Final Volume x Final Concentration = Starting Concentration x X 


Example: 100 ml x 5ug/ml = (x ml) x 100 ug/ml 
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x=5ml 


so add 5 ml of stock and 95 ml H2O 
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Annexure- 


1 List of 23 Y-STR Haplotypes and their respective haplogroups in five Pathan populations of Charsada and Mardan district. 
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